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The two-sample capture-recapture census when tagging 
and sampling are stratified 


By J. N. DARROCH 


Statistical Laboratory, University of Manchester 


1. INTRODUCTION AND SUMMARY 


1-1. We start by recalling the capture-recapture argument used for the simplest type of 
experiment with only two samples and negligible death and emigration rates. 

Let a animals be taken from a population, marked and put back into it. After allowing 
time for these a individuals to ‘mix’ with the others, let a second sample be taken and 
suppose that it comprises b unmarked individuals and c marked ones. Then, if it is assumed 
that every individual has the same probability p of being a member of the second sample, p is 
estimated by j = c/a and, if n is the number of unmarked individuals in the population at 
the time of the second sample, n is estimated by b/p = ab/c. We shall denote this estimate by 
ip and refer to it as the Peterson estimate, although this name is usually given to 


Rip+a =a(b+c)/c, 


the estimate of total population size. 

Theitalicized assumption is both the essence and the weakness of this and all other capture- 
recapture arguments and is implicit in most work on the subject. (See Darroch (1958, 1959) 
for a short review of the literature and a discussion of the multiple-recapture census.) In 
practice it can be violated in many ways which may be summarized as follows: 

(i) Animals can differ in their inherent catchability. 

(ii) The catchability of an animal may change after being captured and marked. 

(iii) The probability p can vary geographically over the region occupied by the population, 
partly because the animals are more catchable in one locality than another and also because 
the effort expended in catching them is not uniform over the region. 


1-2. In order to cope with (i) and (ii) one would need more than two samples but, by 
contrast, it is possible to adapt the two-sample experiment to allow for (iii) and the adapta- 
tions are the subject of this paper. We shall suppose that, at the time of the second sample, 
the region occupied by the population is divided into ¢ subregions or strata in each of which 
p can be assumed uniform. Also that, at the time of the first sample, the region can be 
divided into s strata where each has the property that when the marked animalsarerandomly 
released within it they have the same probability distribution of moving to the ¢ strata. The 
experimenter is required to use a distinctive mark in each of the s strata in order that he can 
record the stratum of origin of the recaptured animals. 

This subject has been treated in three previous papers known to the author. Most of the 
problems raised here were first raised by Chapman & Junge (1956) and many of the answers 
obtained are the same as or similar to theirs, but we shall not attempt a detailed comparison 
as it would lengthen the present work too greatly. The main difference between their treat- 
ment and ours lies in the models used. They estimated the unknown parameters from sets 
of equations relating them to the expected values of the observed frequencies, and only when 
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it came to finding error variances did they assume equal probabilities and independence of 
behaviour for the different animals. We shall start by making the latter assumptions, 
estimate by the method of maximum likelihood, and then later relax the independence 
assumptions in an exact parametric fashion. : 

The earliest of the three papers is by Schaeffer (1951), arising from a tagging experiment 
which was stratified temporally instead of spatially. Chapman & Junge pointed out that 
Schaeffer’s estimate was not a consistent one but they did not analyse his data. We do so, 
in §6. 

Beverton & Holt (1957) considered a similar problem but treated it deterministically and 
assumed that the marked animals are released in only one stratum. 


1-3. Summary. The underlying assumptions are that all marked animals released in a 
given first-sample stratum have the same probability distribution of movement to the 
second-sample strata, and that all animals, marked and unmarked, in a given second-sample 
stratum have the same probability of capture. These assumptions are sufficient when s > t 
but when s < ¢ it is necessary to assume that the unmarked animals move with the same 
probabilities as the marked animals. As one of the main aims of this paper is to keep the 
assumptions to a minimum, most of the attention is given to the case s > ¢ and we look at 
the case s < ¢ only in § 2-5. 

The unknown parameters are the movement probabilities, the catching probabilities, the 
numbers of unmarked animals present in the different second-sample strata and, in parti- 
cular, their total. It is usually assumed in two-sample censuses that every marked animal 
has probability zero of dying or emigrating in the interval between the two samples or, in 
other words, that the ‘survival probability’ is one. As this assumption is often not justifiable 
we have framed the theory in order that it may be avoided if desired. The price paid for 
dropping this assumption is that, instead of estimating the true (movement and catching) 
probabilities and true sizes, we have to be content with estimating them after they have been 
scaled either up or down by the survival probability. However, these scaled parameters do, 
for the most part, have a useful physical meaning. See §§ 2-1, 2. 

Nearly all estimates are found by the method of maximum likelihood and are therefore 
asymptotically efficient. For s = ¢ the estimates of sizes are given by (8) and (9) and their 
variances and covariances by (18) and (19). For s > t, it is possible to allow the survival 
probabilities to depend to a certain extent on the stratum of origin. See § 2-4, (8’), (9’), (18’) 
and (19’). 

In §4 it is shown that if the (hitherto implicit) assumption of independent movement and 
catching of the marked animals is dropped, the estimates are still consistent and the formulae 
for their variances undergo only minor changes. 

In §5 we examine the validity of using the simple Peterson estimate since it would be 
unrealistic to assume that the experimenter always knows about the underlying strati- 
fication and is able to conduct his experiment to fit in with it. Our conclusions there support 
most of the previous literature on this question which has been mainly of a heuristic nature. 

Finally, in § 6, most of the theory is illustrated on the data of Schaeffer’s paper. 


2. PROBABILITY MODEL AND ESTIMATION 


2-1. Subscript i will refer to the s first-sample strata and subscript j to the t second-sample 
strata. 


Of the a animals tagged, let a; be released in the ith stratum and, of these, let c;; be caught 
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Two-sample capture-recapture census 243 
in the jth stratum. Let c; = Uti C;= Dery. Then Le, = Be. 5 ; =c. Let 67,denote the 
probability of moving to the jth stratum for each of the a; individuals and write > OF, = OF; 


the survival probability. (The reason for using asterisks will appear in § 2-2.) For the most 
part, the df will be assumed equal, with common value ¢* say, since this is usually a 
reasonable biological assumption but, in § 2-4 and § 6-3, they will be estimated on the sup- 
position that they differ in a certain restricted fashion. 

Turning to the untagged individuals, we shall assume nothing about their movement 
(except in § 2-5) and therefore they need not be brought into the picture until the time of the 
second sample. Accordingly let nj denote the number in the jth stratum at that time and 
let n* = > nj. Of the nF let b; be caught in the second sample. 


3 
The fundamental assumption will be that all live individuals in the jth stratum, marked or 
unmarked, have the same probability, pj say, of being a member of the second sample. 


2-2. Assuming independent behaviour as regards movement and catching, we have the 
following two probability densities. 


II a;! 
plea] = = MT @,—¢,)' Hey! | IT (1 ~2 OF; p5 i (O55; 97), (1*) 
I ij 
vito = 11 (52) ws 0—ppo™. (2) 


Thus the likelihood of {6%}, {pf}, {nf} is obtained by multiplying (1*) and (2*), giving 
eM* = eMi eli say. Differentiating L* with respect to p* gives 

abt abt by nfo, 

Ops = Opf py 1 — pF’ 


and maximizing with respect to nf by equating A,,;, L* to zero gives 





by _ nj —b; 

es 
Therefore, at the maximum-likelihood value of p}, 0L*/op} = OLT/Op}. Also, since 67; is 
present only in Lf, 0L*/00%, = 0L{/06%,. In other words, maximizing L* with respect to the 
unknown parameters is equivalent to maximizing L} and then (by (3*)) estimating nf by the 
estimate of b;/p7*. This allows us to make the simplification of speaking of (1*) as the likelihood 
and excluding (2*) from the explicit maximization process. 

The next point to be made is that {075}, {p+} must always be non-identifiable to the extent 
of a multiplicative constant, for their likelihood is the same as that of {k0%}, {k-4pf}. In 
particular it is not possible to estimate the survival probability ¢*. To tie down this non- 
identifiability, we shall work with the parameters 0,; = 67;/6* and p; = ¢*pj. These para- 
meters are identifiable (provided s > t) and have fairly obvious probability interpretations. 
Thus, 0,; is the probability of any one of the a, individuals being in stratum j at the time of 
the second sample given that it is alive at that time; while p, is the probability of any animal 
surviving and, if it is in the jth stratum, of being caught there. Correspondingly, it is not 
possible to estimate the nj(unless it is known a priori that ¢* = 1) and, instead, we estimate 
n; = njf/p*, n = Yn;. n,; is the number of unmarked individuals in the jth stratum divided 

i 





(3*) 


16-2 
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by the probability of survival for the marked individuals. Therefore, assuming that there 
is no immigration during the experiment and that marked and unmarked individuals have 
the same probability of survival, n,; is roughly the number of unmarked individuals at the 
time of the first sample who finish up in the jth stratum and n is roughly the total unmarked 
population size at the time of the first sample. 

To summarize the foregoing, our plan will be to maximize the likelihood of {0,;}, {p;}, 


namel a 
fed Foc TE O73). 1 (Oss), (1) 
obtaining estimates {6;,}, {p;} and then to estimate n,; by %; = b,/p, and n by % = un ii;. 
The number of parameters in (1) is st—s (for the 0,; subject to the constraints y 6,; = 1, 
i=1,2,...,8) plus ¢ (for the p;), making a total of : 
N(6, p) = st+t—s. 


Compare this with number of parameters involved when (1) is expressed in terms of 
Viz = 9,;p;, namely NW) = st. 


(The point of introducing the y system is that it reveals in a simple manner certain important 
features of the 0, p system.) We see that 


N(9,p) = N(p) 
according as s St, and the three cases: s = t, s > t, s < t will be treated separately. 
2-3. When s = ¢t, N(0,p) = N(y) and the two parametric systems are interchangeable 
provided only that the transformation y;; = 0,;p; is one-one. In matrix notation it is 
Y = OD,, (2) 


where ¥ = (y,;), © = (0,;) and D, = (6,;p;), the diagonal matrix whose elements are those 
of the vector p = (p,;). Now @1 = 1, where 1 is the vector of s 1’s, and therefore ¥D>"1 = 1. 
Hence, provided is non-singular, that is provided © is non-singular, 


D511 = 1, 
or e = 2-1, (3) 
where ep = (p;) = (p;"). (2)and (3) show that the transformation is one-one if and only if @ is 


non-singular. 
Maximizing the likelihood with respect to {y;,;} the estimates are 


Vis = 04;/4;. (4) 
¥j,; is obviously a consistent estimate of Yi; a8 a, > 00. Translating into the 0, p system, we 
deduce that 6% 

tj Pj = C4/0; 
are the maximum-likelihood equations and give consistent estimates of {0,;}, {p;} provided 
that 6 and © are non-singular. From (4) it follows that © (and therefore also 6) is non- 
singular if and only if C = (c,;) is. If C is non-singular then, from (3) and (4), 


6 = C4D,1 = Ca, (5) 
and since D;*C converges in probability to OD, as the a; > 00, 6 converges to 
D;'0-11 = Dj‘1 =e 














=a Ww VY YW 




















Two-sample capture-recapture census 245 


provided @ is non-singular (as already stated). On the other hand, if 9 is singular, 6 diverges. 
It is easily seen that the elements of § = C-!a are not necessarily positive and greater than 
one. In other words, it is quite possible to obtain a j; which does not lie in the interval [0, 1]. 
The first diagnosis of this anomaly that springs to mind is that the estimation procedure 
should have somehow incorporated the restriction that all p; (and all 6,;) lie in [0, 1]. Usually 
when estimating probabilities there is no need to do this explicitly as their estimates turn out 
to be proportions and, indeed, as far as the y,; are concerned, this is what happens in the 
present case. It is only in transferring from 7; = c;;/a; to 0,; and j; that the anomaly occurs. 
As this transfer is made by imposing the constraints ¥ 6,; = 1 (i=1, 2, ..., 8) it is these that 


J 
are at the root of the trouble and there are, broadly speaking, two possible diagnoses. 
Namely, (i) the true ¢} are not equal or (ii) the true ¢f are equal but sampling errors are 
overwhelmingly large. (i) can be allowed for by reducing the number of second-sample 
strata and applying the theory of § 2-4. This is further discussed in the case of the numerical 
example in §§ 6-2, 3. (ii) means that the j;and the’; = b,/p, are virtually useless as estimates 
of p; and n, and this would no doubt be confirmed by their having large variances. However, 
this does not necessarily mean that the variance of % = > ii; is also large as the covariances 


j 
of the %; may well be large and negative. Again, see the discussion of the numerical example 
in §6-3. 

When C is singular the likelihood equations in the y system are invalid and we must look 
at those in the 0, p system. 0L/0p; = 0 gives 


(a;—¢; ) 0:50; 
————n et ae. 5. 6 
2 1— > 45;; oo (8) 
3 


Introducing s Lagrangian multipliers {A;} corresponding to the constraints > 0;; = 1, 
0L/00;; = 0 gives . 
(a;—¢;) 65D; an 7 
“1- Oi; D; wageate ”) 
j 
These equations have no easy solution except when s = ¢ = 2. (They do, of course, have an 
easy solution when C is non-singular. For (6) and (7) imply that ¥ A,9,; = 0 allj, which, if © 


is non-singular, implies that A; = 0 all i and hence that 6,; 9; = ¢;;/a;.) 

It is obvious from the foregoing that the experimenter must try and avoid stratifications 
which make ® singular; and the further 9 is from singularity the more reliable are the 
estimates {6,;}, {p;}, a fact which will be confirmed by the variance formulae in § 3. There are 
a few trivial instances of singularity which may be foreseeable. For example, if 0;; = 0 all ¢, 
the jth stratum is effectively non-existent. If 0,; = 0,; for all j, the ith and /th strata can be 
combined. If 6;;/0,;, = w, independent of i, the jth and kth strata can be combined into a Jth, 
say, with 0,; = 0,;+9,,., py = (wp; +p,)/(w+1). But it will sometimes be less easy to avoid 
the more general instances of singularity. 

If © is non-singular and the sample sizes are large, a singular C will be extremely im- 
probable. But, if it does occur, then, because of the difficulty of solving the likelihood 
equations and in view of the fact that the variance formulae of §3 are derived conditional 
on C being non-singular, the best procedure will be to group two of the first-sample strata 
and two of the second-sample strata in such a way as to eliminate the singularity. (The 
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first-sample grouping may be omitted, making s > ¢ and involving us in the theory of § 2-4.) 
If both groupings are made when s = t = 2 the result is an unstratified analysis and it is 
interesting that the maximum-likelihood solution with A, and A, non-zero throws out strong 
hints that this is the correct procedure. For the latter solution is 


6, = bs = [(€y Cg — A q0y1) + (Cy Cog — C1201) ]/(41 Cg, — gC; .), 
61» = Do =1 -4,,, Pp, = c ,/a,,, P2 = C 2/a0 9. 


(This solution, which always exists, is obviously not consistent.) When ¢,,Cy:—C Ca, = 0, 

besides 4,, = 6,, and 6,, = 9,, which indicates a combination of the first-sample strata, it 

turns out that ~, = f, = c/a which indicates a combination of the second-sample strata. 
Finally, let us use (5) to estimate n = (n,). Since fi = (b,/$;) = D,6, 


fi = D, Ca, (8) 
first found by Chapman & Junge under similar assumptions. Also, since % = 1'ii, 
i = b’C—a. (9) 


2-4. When s > t, N(0,p) < N(y) and the y;; are mathematically dependent functions of 
{9,;}, {pj}. This means that the simple estimation equations /;; = ¢,;/a; are invalid and we 
have to solve (6) and (7). These equations have no explicit solution (except in the trivial 
caset = 1) and, for any given data, they would therefore have to be solved iteratively and the 
information matrix inverted to find error variances and covariances. If s and ¢ are not small 
this is not an attractive proposition but, fortunately, it is easily avoided as follows. 

We continue to make N(8, p) = N(y) by allowing the ¢ to differ. Let 6* = (¢*)/s and 
let us work with parameters . 


6; = 0% /0*, ?; = Gis, Ss fier P* pj. 


These parameters correspond closely to the 0;; and p; of § 2-2 but do not, unfortunately, have 

simple probability interpretations. This gives a total of st — 1 +¢ parameters, the subtraction 

of one being due to ¥ 0;; = s. Instead of imposing s— 1 constraints: ¢, = ¢. =... = ¢,, a8 we 
ij 


effectively did in § 2-3, let us instead only impose t—1. Then N(0, p) = st and (with certain 
provisos) the estimation equations are 6,;; = c,;/a;. The t — 1 (independent) constraints will 
usually take the form ¢;—¢, = 0 or possibly $,+¢,—2¢, = 0 but we can write them 


enerally as 
7 sf LX%xeP; = 0 (k=1, 2, ...,¢—1). 
i 
if we define u,, = 1/s. The matrix U = (u,,;)(k=1,2,...,t), (¢=1, 2,...,8) is of full rank ¢ 
because the first {— 1 rows are independent among themselves and, provided they make 


sense as constraints, the last row is independent of them. For, if not, the first ¢— 1 rows could 
be combined to give ¥ ¢; = 0, which is nonsensical. 
i 


Let v’ denote the 1 x ¢ vector (0, 0, ..., 0,1). Then 


UOl =v 
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replacing @1 = 1 of §2-3. Adapting the argument of § 2-3 we find that, provided UO and 
UD;'C are non-singular or, in other words, provided © and C are of full rank ¢t, consistent 
estimates of p = (pj) and @ are given by 


6 = (UD;"C)*v, (5’) 
6 = D;'CD,,. 
In the trivial case t= 1, the ¢; = 0;, are estimated without constraints giving 
Pi = (Leu/a)/s, $i = Cy /4; fy. 
Finally, n = (n;) = (n¥/*) is estimated by 
fi = D,(UD;'C)-*v (8) 
and n = n*/d* by 
i = b’(UDz1C)“v. (9’) 
If $* is also the ‘average’ probability of survival of the unmarked individuals, then n,; and 


nm have the same interpretations as in § 2-3. It is of course not necessary to work with 
$* = (X ¢})/s and any linear combination ¥ v;¢*, where > v; = 1, would serve instead, and 
i i i 


would only involve changing u,; from 1/s to p;. 

2-5. When s < t, N(0,p) > N(y) and {6,,}, {p;} are not identifiable; only N(y) functions 
of them are. If we try to estimate {p,} from the equations y,; = ¢,;/a; (or, equivalently, from 
(6) and (7)) we only get s equations: > c;;~;* = a;, for t unknowns. Since j; cannot be deter- 
mined, neither can i; = b;/p;. , 

At this point it is appropriate to consider the effect of assuming that marked and un- 
marked animals have the same movement pattern since, with this assumption, n can be 
estimated even though the n; cannot. The unmarked individuals are now brought into the 
picture at the time of the first sample by letting m,; denote the number in the ith stratum. 
The density p[{b;}] becomes a convolution of s multinomial densities, the ith having para- 
meters m,, Wi1, Wig, ---» Wy, and the likelihood equations are intractable. The best we can do, 
therefore, is to consider the moment equations 


UM Yi = b;, 
v 
: a Yrg = C4. 
From these we get ¢ equations 
v 


for s unknown parameters {m;}. When s = ¢ the solution of (10) is 
mn’ = b’C-!D,, 
giving ym; = b’Ca (11) 


which is equivalent to (9). When s < ¢ there are more than enough equations and two courses 
are open: either equations (10) may be replaced by s linear combinations of them, or the 
number of parameters can be increased by relaxing the equality of some of the movement 
probabilities or by introducing some immigration parameters. We shall not go into details. 

Finally we note that in deriving (10), we used y,; merely as the probability of being 
caught in the jth stratum for an individual starting in the ith stratum and did not have to 








248 J. N. DarrocH 


express it as the product of a movement probability and a catching probability. Conse- 

quently, if it is expressed in this form, it can be taken as 0;;p;; with no restriction on the 

relative values of the ¢,( = > 0,;). Thus (as pointed out by Chapman & Junge), in return for 
j 


imposing the assumption that marked and unmarked animals have the same movement 
pattern, we can allow the survival probabilities and the probabilities of capture to depend on 
the stratum of origin. However, we consider that this involves losing more on the round- 
abouts than is gained on the swings and we shall therefore keep to our original assumptions 
for the remainder of the paper, and merely note that much of the ensuing theory for s > t 
could be adapted under the different assumptions for s < t. 


3. First AND SECOND MOMENTS OF THE ESTIMATES 

3-1. We first find the approximate bias of 6 for the limit process: {0,,}, {p;} constant and 
all a; > co in such a way that a,/ais fixed. Both 90 and C will be assumed to be of rank t, and 
we shall neglect the probability that C is of rank less than t. 

= Tr = E[C] = D,eD, 
and Cc— T = Z = (243). 

When s = t, 6 = C-1a and 

C-! = (I-T-1Z + (P-Z)?_-... + (-1) "7 (7 Z)" 34) T+ (- "(TZ)" C4. 
Therefore, since !—1a = pe, we have 
E{é—e] = E[— (TZ) + (PZ) —... + (— 1)" (PZ) Jp +r, 


where r = (r;) = (—1)"£[(T-1Z)" Ca], the remainder vector. Let F = (f,;) = (T-1Z)™ 
and let C;,; denote the co-factor of c;; in C. Then, since |C|, the determinant of C, is assumed 
numerically greater than or equal to one, 


Ir;| a ELD Sin Gul |Cl)} < = IE Use Cull S = (BL fie E[C},])* a. 


Now E[f},,] is the sum of products of terms of I'-! of total power 2m and moments of Z of 
order 2m. The former are O(a~®”) and the latter, since they are multinomial moments, are 
O(a). Also, Cy, is numerically less than ([] ¢;.)/e,, which is O(a*-1). Therefore 

i 


Ir;| < O(a-m+im+s—1+1) = O(a-i™+8) +>0 
for m > 2s. It is therefore permissible to write E[6—] as the infinite series 

E{ —(T-1Z) + (TZ)? — (T-1Z)8 +... Jp. 
The first term is zero since Z[Z] = O, the (2m)th is O(a~”) and the (2m + 1)th is O(a-™-), 
m> 1. 

Therefore the bias of 6; = 1/,; is O(a~) and this is asymptotically negligible compared with 
its root mean square which we shall show in § 3-2 to be O(a-+). The same holds true, when 
8 > t, of 6 = (UDz'C)v. 

Next, let us augment the above limit process by supposing that all n; > oo in such a way 
that n,/n and a/n remain fixed. Let 8 = E[b]. As b is independent of C, 

E{fi—n]=D,£[6—e], E[i—n] = p’E[6—e]) 


which are both O(1) and negligible compared with the root mean squares which are O(n}). 
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3-2. Because the bias of 6 is negligible, its variance-covariance matrix is 
E{(6—e)(@—-e)'] == 


say, and, arguing as in §3-1, this matrix can be expanded as a doubly infinite series whose 
terms are O(a~™), m > 1. Suppose s = ¢. Then the leading term, of size O(a-), is 


E(T—Zpp'Z'T’]. (12) 
The (t,/) element of E[Zpp’Z’] is 
Aly 2ijP iPr). (13) 
sk 
Now Elz 52.) = ba 4{8 1.915 Pj — 95 Pj ie Pr] (14) 


since c,; and ¢c,, are independent when? + Jand, wheni = 1, havea multinomial distribution. 
Substituting (14) in (13) we get 


by ald 045; — (x 6:5) (2 6:x)] = Parl 9;;[p;—1] 
and this is the (i,/) element of D,D,,, where p = (,) and 
B= 2915 P5— 1. (15) 
Substituting in (13), we can say that 
z= ~ r-'D,D, r’-! = D,O"D,, D;'0’""D.,. (16) 


Notice that, when @ is near to singularity and the elements of @-" are large, 6 will be an 
inaccurate estimate of p. 
When s > ¢, (16) is changed to 


= ~ D,(U®)- UD, Dz? U'(U8)’"D,, (16’) 
where My = LO,5|;— $3. (15) 
I 
3-3. Let y denote the deviation of b from its expected value B = D,n. Then 
fi-n = D,é—Dgp = D,(6—e) + Dy 6. 
Therefore E{(fi—n) (fi—n)’] = D,ZD, + E[D, 6@’D,]). (17) 


The leading term in (17) is O(n) (= O(a)) and the others are O(n-”), m > 0. Retaining only 
the former, the second term in (17) may be replaced by E[D,pp’D,] = D,[D, —1¢*-"] 
since E[y;y,] = 64,2; p;(1—p;¢*—*). Substituting (16) when s = ¢ 


E{(fi—n) (fi—n)'] ~ D,O“D, D;'0’"D, + D,(D, — 1¢*-). (18) 
Since (%—n)? = 1/(fi—n) (fi—n)’ 1, 
E[(%—n)2] ~ n'@D,, Dz10’—1n +. n'(p — 16) 
- EH Mela + Lj(Pj— 9"), (19) 
where n’ = (y,) = 1’. (20) 


If is near to singularity the low accuracy of 6 impairs that of fi = (7,) as is to be expected, 
but the accuracy of % = > %, is almost unimpaired. We have so far specifically avoided any 


¥] 
assumption about the movement of the unmarked individuals between the two samples but, 
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to explain the last remark, suppose that they have the same movement pattern as the 

unmarked individuals. Suppose, also, that there is no immigration and let m,; denote the 

number of individuals in the ith stratum. There are two ways of formally equating the move- 

ment patterns. One way is to think of nf as a random variable with expectation > m, 6%;, but 
7 


this involves some tiresome and unimportant complications. Instead, we shall continue to 
think of nf as a parameter by the device of defining {67;} by the movement of the unmarked 
animals. Then > m,6*; = n¥ or } m,0;; = n; or 

i i 


m’® = n’. (21) 


Thus, if the movement patterns are the same, y = m. The only point we wish to establish 

from these considerations is that, even if the movement patterns are different, the 7; are 

likely to be positive and, since } 9; = n, the fact that 9 is near to singularity has no effect on 
i 


(19). On the other hand, if n and © are ‘incompatible’ in the sense that some of the 7; are 
negative, either because of very different movement patterns or because of immigrants 
increasing the numbers of individuals in the t strata disproportionately, this does increase 
E{(%—n)?] because the coefficient of 7? in (19) is positive. Of course, these conclusions only 
apply to the leading terms of size O(n) in E[(i—n)?]. They do not apply to the terms of 
smaller order and the latter might have to be taken into account if © was very near to 
singularity. 


When s > f, 
E{(%—n) (fi—n)’] ~ D,(U®)-! UD, U'(U®)’" D, + D,(D, — 14*-1) (18’) 
and El(a— ny} ~ Srimilact E n,(p;— $*-), (19’) 
where y’ = n'(U8)"U. (20’) 


From (20’) we have n'O = n’ and, if the movements of the marked and unmarked animals 
are the same, m’O = n’. However, besides m, there is now an infinity of vectors § such that 
§’0 = n’ and we cannot infer that y = m. Therefore, the above remarks for s = ¢ cannot be 
extended for s > t. 

Finally, % is a consistent estimate of n in the sense that, for the limit process we are using, 
i/n converges in probability to one. 


4. ‘CONTAGIOUS’ MOVEMENT AND CATCHING 


4-1. In writing p[{c;;}|{a,}] as a product of multinomial densities (see (1)) it was assumed 
that the a; individuals released in the ith stratum move and are caught or not caught 
(i) independently of those released in any other stratum, and (ii) independently of each other. 
(i) is a very reasonable assumption, but (ii) is less likely to hold true in practice and in this 
section we examine the consequences of relaxing it. 

Let us refer to the a, individuals as J, J, ..., J,,. In considering how, for instance, the 
‘experimental histories’ of J, and J; may be dependent we must distinguish between the two 
stages: movement to one of the ¢ strata and, once there, being caught or not caught. To assist 
this distinction, let a,;; denote the number of individuals released in the ith stratum who are 
alive in the jth at the time of the second sample. Then a;— > a;; is the number who die or 

j 


emigrate between the two samples. We shall first find the distribution of the a,; and then the 








Se 
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conditional distribution of c;; given a,; and, in doing this, it is convenient to revert to the 
original parameters {075}, {p;} which are now interpreted as common marginal probabilities, 
that is the probabilities for any single individual disregarding the histories of the other 
individuals. 

We note that, whatever form the dependence takes, 

Efe,;) = E[E{c;;\a,]] = Ela p7] = 4,07; pF 
as in § 2 and therefore the /; are still consistent provided their variances still tend to zero as 
a; -> 00. Itis these variances in which we shall be mainly interested, and since they principally 
depend on var (¢;;), cov (c;;, ¢;,) we must find how the latter are changed by dependence 
from a,67,p}(1—0},p7) and —a,6},p*6%, pg, respectively. 

4-2. To construct p[{a,;}] it is necessary to specify how such probabilities as 

Fis > j\Zp >k,I,>] 

(using a self-explanatory notation) differ from 6#,. The assumption that there is no difference 
is really justified only if J, J, ..., J,, are each released at a random point in the stratum and 
this would be difficult to achieve in practice. Two possibilities which are more likely are that 
they are released either close together in a randomly chosen subarea or over a carefully 
spaced grid of points. In the former case we should expect that, for instance 

PU, > jp > 5, L, > > Pi > i> J) 
and in the latter case a reversal of these inequalities. In other words, if the animals start 
close together, they are more likely than otherwise to move to the same stratum whereas, if 
they start with maximum possible distances between them, they are more likely than other- 
wise to move to different strata. The point of release may be thought of as determining the 
initial movement of an animal to the extent that the latter depends not only on such things 
as topography, wind or current, temperature, the local food distribution, but also, especially 
if the species is gregarious, on the location and movement of the nearest group of unmarked 
animals. (It should be noted that we are fortunately not required to try and describe the very 
complex interdependence of the movements of the unmarked animals nor the dependence of 
the marked on the unmarked, only the interdependence of the marked.) 

Let us call the above two types of dependence ‘positive’ and ‘negative’, respectively. 
Then, as a first approximation, we may describe them in terms of the statistical concept of 
contagion by a simple generalization of Folya' s urn model. Since death is a possible contin- 
gency it is reasonable to assume that it i is contagious alsu and it can be thought of formally 
as the (¢+ 1)th stratum, letting a; ,,, = a;—- DMs and 6¥,,,; = 1—¢7. 


OF, 


Consider an urn containing f, balls of sabi fi, are marked ‘h’ and where f;,/f; = 97, 
(h = 1,2,...,¢+1). Thus the proportions of balls in the urn are the marginal probabilities. 
If there is positive contagion, the conditional probability P[J, > h, |J, > hy, ...,J,.4 > h,-4] 
is the proportion marked ‘h,’ after adding g; marked ‘h,’, g; marked ‘h,’, ...,g; marked ‘h,_,’. 
In this way we obtain (fy 

Tr! (finl9i+%in— 1)! [(filgi+ei—! 
POS ES ET ale | Odeo 
(If unequal numbers 94, , Jin, «++» Jin,_, of balls are added, the resulting density is very dif- 
ficult to handle and, moreover, pray on the ordering of the individuals, a feature we do 


not desire.) Hence Pi{ag}] = ‘tt ( Sinl¥it+Gan- ‘) / (’ (Gi +Q— ‘) (22) 





Gin a; 
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It is readily deduced from (22) that 
var (a,,) = K,a,0%,(1—O%,), cov (ai, 44.) = —K,a,7, 93;,, (23) 
where K; = (f;/9;+4,)/(f:/9;,+1). Thus 1 < K <a,, the extreme values corresponding 
respectively to no contagion and identical behaviour of the a; individuals. 
If there is negative contagion, balls are subtracted instead of added and provided f; > 9,a;, 
an obviously necessary condition, (23) still holds with K, = (f;/9;—4,)/(f;/g;—1). Thus 
K;, < 1 and, in practice, would be substantially greater than zero. 


4-3. If the catching in the jth stratum is not uniform but is concentrated in one or more 
subareas and if, having a common stratum of origin, the a,; individuals are not uniformly 
distributed in the jth stratum, there will be a certain amount of positive dependence in 
their being caught or not caught. Otherwise, it is fair to assume independence. (We shall 
ignore the possible negative dependence of the catching of individuals from different strata 
of origin.) A density analogous to (22) can be constructed for p[c,;|a,;] and 

El (¢,;— 4,597 )?| 44] = Lyi p7 (1 —p7), (24) 
where 1 < L,; < aj;. 

4-4. In the following we can generalize from specifically contagious dependence as 
described by urn models to any form of dependence which alters the variances and covari- 
ances in the simple manner given by (23) and (24), where K;2 1 and L,; > 1 (but probably 
not much greater than one). Combining (23) and (24), 


var (¢;;) = La, 0%; p7 (1 — pj) + K,a,07;(1 — O%)) p}?, 
COV (C45, Ci.) = —K,a, 07; p50}, De. 
Suppose s = ¢. Then using these new variances and covariances in place of (14), we easily 
find that it is still true that 


= ~ D,O-'D,,D;'0’"D, 
except that, instead of (15), 


y= E Laj9451P5— (a D5 5)/6* + K,(1/6* — 1). 


We can now note the interesting fact that the factor K; has barely any effect on the 
efficiency of the /; and what little effect there is can be shown to be due to dependent mortality 
(or emigration) and not to dependent movement. Dependent catching, on the other hand, 
increases the largest terms in y; in the ratios L,;:1 and therefore also increases Z approxi- 
mately in the ratio L:1 where L is a suitably defined average of the L;;. 

When s > ¢, instead of (15’), 


= ~ D3 443/25 - ix L4;9;;)/8* + K,(G;/8* — $3) 


and similar conclusions obtain. 
We have, of course, only dealt with the leading terms in the expansion of Z but the others 
may still be neglected unless either form of dependence is very considerable. 


5. VALIDITY OF THE PETERSON ESTIMATE 


5-1. It was taken for granted in §§ 2, 3, 4 that the experimenter knows how the population 
is stratified and is able to conduct the experiment in the requisite manner. Suppose, however, 
that he does not know or that he is unable to mark the animals in s distinct ways and record 
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t separate catches. He will then use the Peterson estimate i, = ab/c (or its unbiased version 
(a+1)b/(c+1)). 

In § 5-2 we examine the legitimacy of doing this and then, in § 5-3, the appropriateness of 
the corresponding formula for var (7p). 

The Peterson estimate not only has simplicity to recommend it because even if a stratified 
experiment is performed, producing an estimate 7, 7p is preferable if it is valid as it generally 
has a smaller variance than % (see §5-5). In §5-4, therefore, we consider tests for its validity. 
These tests are also necessary to complement the estimation theory of §§2,3 and they 
reveal important facts about the movement and catching of the marked individuals. 

Except where otherwise stated, there will be no restriction on the relative values of s and t 
nor on the relative values of the ¢¥. We shall work with the parameters 


64 = 0%/9*, $= D445; P; = p*p}, n; = n¥/$*, = =; 
J ] 
where $* = (> 6#;)/s. 
i,j 


j i,j 


Np = ap|y. 


For the limit process: a;-> 00 and n;—0o in such a way that a,/a, n,;/n and a/n are all 
constant, p/n remains constant and %,/n converges in probability to np/n. Therefore, fip 
estimates n if np = n, that is if 

n 2 4,5; =a ~ 0; Dj. (25) 


(25) has an infinity of ‘accidental’ solutions but we shall be concerned only with those having 
a simple physical interpretation. One of them is 


that is, the expected number of marked animals in the jth stratum is proportional to the 
number of unmarked. A special case of (26) is 


y= n,/n. 
Next, consider the condition that 
Pi=P say. (27) 
This makes np = n/¥ (a,/a) ¢; = n*/¥ (a;/a) $f which is equal ton = n*/> ($7 /s) if the dF or 
the a; are equal and differs very little if they are not. 


A further condition for the validity of %» can be obtained by making a minor assumption 
about the relative values of n and O. It is that there exists = (£;) such that 


ny = DE By (28) 


€ always exists if the movements of the marked and unmarked animals are the same, since it 
may then be taken equal to m, the vector of first-sample strata sizes. Otherwise, if s > t, 
§ exists provided @ is of full rank t. (When s = ¢, €’ = y’ = n’O-! and when s > ¢ there is an 
infinity of such §, some of which at least are in the form y’ = n’(U®)-! U where U is any 
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matrix of the type described in §2-6.) If s < t, § does not necessarily exist. Anyhow, 
assuming that there is a & satisfying (28), (25) becomes 


” 7 4;0,;p; = 4 > £5945 0;- (29) 
Consider X9,;p; =p say. (30) 
i 
This makes Mp = DE, = n*/D(Ei/D§;) $F 
a v ¥] 


which equals n if the ¢¥ or &; are equal and, otherwise, will not usually differ very much 
from n. If ¥ 0;; = 1, (27) is a special case of (30). Another special case of (30) is 
j 


0,;=90; say. 


Lastly, when the movement patterns are the same, (26) becomes ~ a;,9;; = (a/n) Xm,9;; 
and a particular solution of this equation is , 


a,/m,; = a/n, 
the first-sample counterpart of p; = p. 


5-3. Having listed the various conditions under which %p is valid, namely (26), (27) and 
(30), let us suppose that the experimenter assumes as a matter of faith that one of them holds 
true and conducts an unstratified experiment. We now check that the formula for var (7) 
that he is obliged to use is an appropriate one. 

It is an easy matter to show that 


E{(ip—np)?] ~ (f?a*/y*) var (c) + (a*/y*) var (6) (31) 
by retaining only leading terms as in §§ 3-2, 3-3. Now 
var (c) = ~ var (¢; ) = wai( 64593) (l— ~ 94505) 
< (24955) (1 -z 4,;;p;/4) = a(y/a)(1—y/a) = ‘var(c)’ say. 
var (c) = ‘var (c)’ only when > 0,;p; = p say, that is when c is a true binomial variable. The 


experimenter has no other course but to use ‘var (c)’ in (31) and we see that, in doing so, he 
overestimates var (c) if anything and therefore errs on the right side; but the difference is 
extremely small. Similarly, ‘var (b)’ = n(@/n) (1— £/nd*) overestimates 


var (b) = TT — p;|$*) 
slightly unless p; = p say. Inserting ‘var (c)’ and ‘var (b)’ in (31), 
‘E[(ip—np)*]’ ~ (Ba*/y*) (1 — y/a) + (a?B/y*) (1— B/nd*) (32) 


which is estimated by replacing # by b, y by c and n by fip. 

If there is movement dependence among the marked animals, it has a small but negligible 
effect on the second of the four terms in (32) and, in view of the fact that the first term is by 
far the largest, we can certainly ignore this effect. Catch dependence does increase the 
largest term, however, as in §4-4. For, if a,, c,, and p,, denote the number of marked animals 
present, the number caught and the marginal probability of being caught in a given sub- 
region and, if E[(c,—4a,p,,)*|a,] = La,p,(1—p,), the first term in (32) is multiplied by L. 
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5-4. Reverting to the stratified experiment, we turn from estimation to testing hypo- 
theses. In most cases, the likelihood-ratio method is an obvious one to use and a great 
variety of hypotheses which might be of interest in particular experiments can be tested in 
this way. However, our attention will be confined to those tests which bear on the validity 
of %p and, moreover, which involve only simple functions of {a,}, {b,}, {c;;}. Consider 


Ay: D4; = p, say; Hy: p;=psay; Hy: X4;9;; = an,/n; 
| t 
Let H denote the general hypothesis which puts no restrictions on {0;;}, {p;} except, of course, 
>» 6; = 8. 
i,j 
On H, the ,; = 9,;p; are independent and, substituting 7,; = c;;/a;, the maximum value 
of the log-likelihood of {6;;}, {p,} is 
Le Bey log ¢,;+ p> (a,—c¢;.) log (a;-—¢,;.)— La; log a;. 
On H,, the log-likelihood is 
L, = YG; log x,;+clog p+ (a—c) log (1—p), 
i,j 
where x;; = 0,;p;/p. Maximizing subject to }) x,; = 1, 
j 
L, = Deyzloge;—De;,loge, +cloge+ (a—c) log (a—c)—aloga. 
i,j i 
The number of independent, identifiable parameters in L is N(y) = st and the number in L, 
is st —s+1.To test H, against H we use the fact that, on H,, 2(Z—,) = y2_, approximately. 
—_ 2(L—L,) =2 Xe, log (ae; /a;c) + 2 ¥ (a; —¢;,) log (a(a,—¢;,)/(a—c) a;) 


and this expression is asymptotically equivalent to 


(c,, —a,¢/a)? (a,;—¢;,—a,(a —c)/a)? 
¥ a,cla +2 a,(a—c)/a ; 
Therefore, the test is asymptotically equivalent to a y? goodness-of-fit test for proportionality 
in the last two columns of (33). 








Cy Cy | Ay—C, | Oy 

Coy --> Cop | Ag— Cy, | Se, (33) 
Cy Cy | 

b, eee b, | 





This is hardly surprising when it is remembered that, on H,, E[c;] = a; p. 

There is no very satisfactory test of H, and, to see why, three possibilities have to be 
considered. First, suppose that nothing is assumed about the relative values of the ¢;. The 
Viz = 9,;p are then mathematically independent and L, is not identifiably distinct from L. 
Therefore H, cannot be tested. Secondly, if q constraints are imposed on the ¢;, where 
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0 < q < s—1, the maximization of L, is awkward and, while a likelihood ratio test based on 
x2 can be constructed, it does not take a simple form. Lastly, if it is assumed that 


$= LG; =1, 
j 
all i, L, = Y¢,;log 6,;+clog p+ (a—c) log (1—p), 
i,j 


and this is not identifiably distinct from L,. Therefore, in attempting to test the hypothesis: 
p; independent of j and ¥4;; = 1, one is really testing the more general hypothesis: } 0,;p; 
independent of i. . 

As H, involves the n; it cannot be tested by the likelihood-ratio method without involving 
the density p[{b;}] as well as p[{c,;}|{a;}]. We shall not do this but, instead, simply note that 
it is equivalent to observe that E[c ;] = (L4;9,;) p; and E[b,;] = n;p, and that a x?_, test of 


proportionality in the last two rows of (33) tests the hypothesis 
D494; =vn;, oF 14,9; = (La; 9;)n,/n. 
v u v 


This is not quite the same as H, and, if true, implies that np = n/{> (a,/a) ¢;} rather than n 
i 


but, as we pointed out in § 5-2, this sort of difference is immaterial. 

H, is a particular case of H, and would not therefore be tested unless H, was accepted first. 
On H,, the log-likelihood is L, = }c_; log y;+(a—c)log(1— Sy), where y; = 0;p,. L, is 

j j 
maximized when y; =c_,/a giving L,= Xc¢_;loge ;+(a—c)log(a—c)—aloga. On H,, 
j 

2(L—L,) = x%,_, approximately and this is equivalent to a x? test on the contingency table 
formed by the first s rows and ¢+ 1 columns of (33). The x2,_1)y_1 test on the first s rows and 
t columns, that is on the c,; alone, is easily seen to test H, against H, or, viewed differently, 
it tests the hypothesis: 0,; = 4,9; against H. 

H, cannot be tested as L, is not identifiably distinct from L,. 


5:5. The theoretical discussion of the stratified experiment is concluded by a comparison 
of the variances of % and fip. Much simplicity is gained and nothing essential is lost by 
assuming in what follows that all 6; = 1. One advantage of this assumption is that it makes 
Np = n exactly whenever 7p is valid. Writing np = n, the formulae to be compared are 


E[(iip—n)*] ~ (B?a?/y*) var (c) + (a/y*) var (b) = P,+P, say, 
and E[(ai—n)*] ~ Dyimla,+ XD nj(1/p;— $*-) = 8, +8, say, 


where y’ = n’O or n’(U®)' U according as s = t ors > t, and yw; = > 0;,;/p;—1. 
j 
It was observed in §5-3 that P, = (a®/y*) ¥n,;p,(1—p,/$*) < (a*/y*) B(1—B/ng*) with 
j 
equality only if p; = p. Therefore, since af/y = n, P, < n®/B—n/¢*. In much the same way, 
S, > n*/8—n/$* with equality only if p; = p. Therefore S, > P,. 


We also observed in §5-3 that P, < (f?a*/y*) y(1 -y/a) = n*/y—n?/a with equality only if 
& 9p; = p. To compare S, with P,, it is necessary to consider separately the conditions 


J 
under which ip is valid and these boil down to either H, or Hj. 
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Now, for 7% to be consistent it is necessary that © is of full rank ¢ and, this being so, (34) 
implies p; = p. Thus, it is sufficient to consider H,. On H,, uw; = 1/p—1 = a/y—1. Therefore 


S, = (a/y—-1) Lrile = (a/y—-1) [ntja+ & (45% —%9;)"/4,%] 


> (aly —1)n®/a = n®/y—n®/a > P,. 


On H;, X4;9,; = an,|n. 
Also, 1:94; = N;. (35) 
Therefore } 


S,= py (;/4;)? a;M; = (n®/a*) Y a,u, + 2(n/a) ¥ (n/a; —nJa) a,n,+ D (y,/a;—n]a)P? a,n;. 
The first of these terms is (n?/a?) [X4,9,;/p;—@] and, using H;, equals 
(n/a) [X ny/p5—n] = (n/a) Sp > n®[aB—n?/a = ny —n2Ia. 
3 


Using H, and (35), the second term is zero. The third term is greater than or equal to zero. 
(It equals zero when s = ¢ since H, and (35) imply that a; = (a/n) 7;. When s > t, however, 
,/a; + n/a in general.) Therefore, 8S, > n?/y—n?/a > P,. 

Thus, in all cases when 7p is valid, 


E[(i—n)?] ~ 8,+8, > P, +P, ~ E[(ip—n)*] 


with equality only if p; = p and 7,/a; = n/a. The actual difference S, + S,—P, — P, is easily 
found from the above and, in practice, can be quite substantial. 


6. ANALYSIS OF SOCKEYE SALMON DATA 


6-1. In an experiment reported by Schaeffer (1951), both stratifications were with 
respect to time instead of place. The population comprised all adult sockeye salmon who 
passed a certain point of a river during a period of s = 8 weeks on their way up-stream to 
their spawning grounds. The fish were sampled and tagged according to the week in which 
they passed this point. Provided they succeed in reaching the spawning grounds, most 
adult salmon die after spawning. In this case, the deaths took place over a period of t = 9 
weeks, and, during each of these weeks, a number of dead fish were recovered, presumably 
very soon after death. As Schaeffer’s paper is not easily obtainable for reference, the data of 
his experiment are reproduced in Table 1. The frequencies in some of the outer weeks are too 
small to be used in what is essentially a large-sample theory and we have therefore reduced 
both s and ¢ to four by grouping the first 3 and last 3 weeks of tagging into single strata and 
the first 3 and last 4 weeks of recovery into single strata. The new values of {a,}, {b;}, {c,;} are 
given in Table 2. 

{n¥} and {p¥} have the usual interpretations but 67; now signifies the probability of dying 
in the jth stratum for a fish tagged in the ith and ¢7 the probability of dying on the spawning 
grounds during the 9 weeks period. 1— 7 therefore represents the probability of dying 
before reaching the spawning grounds or of surviving until after this period. (A small 

17 Biom. 48 
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percentage of salmon do manage to reach the sea alive and return to spawn again. See Jones 
(1959).) 

It is soon apparent that the Peterson estimate is not valid. For, in testing H, which 
specified proportionality of the vectors (c; ), (a,;—c;.), we obtain 72 = 16-91 and the 0-1 % 
value is 16-27. The vectors (c, ;), (b;) are so obviously not proportional that there is no need to 
apply a x? test to see that H, is unacceptable. 


Table 1. Schaeffer’s data. {c;;} with {a,}, {b;}; 8 = 8, t = 9 





Week of Week of recovery (7) 

tagging §=— ‘ —s 
(t) 1 2 3 4 5 6 7 8 9 Totals a; 
1 1 = 2 3 15 
2 1 3 7 : 11 59 
3 EF “a 33 24 5 1 ; 1 76 410 
4 5 29 79 52 3 2 7 3 180 695 
5 : 11 67 77 2 16 7 3 183 773 
6 . 14 25 3 610 6 2 60 335 
7 P : 1 5 6 59 
8 ° ° ; ‘ 1 ° ; 1 5 

Totals 3 6419 82 184 159 9 30 26 8 520 2351 

b; 16 113 718 2664 3317 635 1217 904 368 9952 


Table 2. Schaeffer's data. {c,;} with {a,}, {b;}; 8 = 4,t = 4 


Cy Ci2 Cig Cig Cy, a, — C;, a; C;,/a; 
Cy 59 24 5 2 90 394 484 0-186 
Co; 34 79 52 15 180 515 695 0-259 
C33 ll 67 77 28 183 590 773 0-237 
Cas 0 14 25 28 67 332 399 0-168 
cy 104 184 159 73 
b; 847 2664 3317 3124 


6-2. Applying the theory of § 2-5, it is found, on evaluating 6 = C—!a, that 
p, = 01318, f,=1-9461, ,=0-1947, p, = 0-1063. 


The unsatisfactory value of #, may be just a symptom of the general inaccuracy of capture- 
recapture estimation or it may indicate that the model is incorrect in assuriing that the ¢7 
are equal, this being necessary for the consistency of #,. Both of these explanations are 
probably correct but while nothing can be done about the first, we can act on the second. The 
c; /a;, indicate where the possible differences in the ¢7 lie, the middle two being appreciably 
larger than the outer two. Let us therefore estimate subject only to the two constraints: 
of = $f, $f = $F. This necessitates a reduction of t¢ from four to three and, consequently, a 
grouping of two of the second-sample strata. It is permissible to group the jth and kth strata 
if (i) (0;;9; + 9.P,)/(O4; + 94,,) is independent of i, in particular if (ii) p; = p;, or (iii) O;;/0,, is 
independent of i. (ii) cannot be tested but (iii) can as it implies proportionality of the jth and 
kth columns of (c,;). In this case, the columns which are nearest to being proportional are 
the third and fourth and we therefore group them. (The hypothesis of proportionality is 
rejected at the 0-1 % level but, even so, if p, is not too different from p,, (i) will hold approxi- 
mately true.) 
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3 6-3. Applying the theory of §2-4 to 

59 24 7 
1 1 0 0 -1 
, u-(o a 1 ol, co(H me 
> 0-25 0-25 0-25 0-25 


0 14° = 538 
a’ = (484, 695, 773,399), b’ = (847, 2664, 6441) 
and evaluating 6 = (UD;!C)-"v, we obtain 
é’ = (6-021, 1-607, 6-397) or p’ = (0-1661, 0-6223, 0-1563). 
pz now lies in [0,1 ] but is still curiously high. The other estimates are 


0-7339 0-0797 0-0925 
0-2945 0-1827 0-6167 
0-0857 0-1393 0-8689 
0-0000 0-0564 0-8497 


6 = D;"CD, = 


and, summing the rows, 





$, = $, = 90-9061, J, = d, = 1-0939. 
Also fi’ = gD, = (5,099, 4,282, 41,204), 
| and fi = 50,585. 
The estimated variance-covariance matrix of 6 is 
9:96 —14-84 6-31 
| == (-1s 23-58 -032. 


6-31 —10-32 4-78 





Note the very high variance of /,. It is unlikely that there was any catch dependence in this 
experiment so that we need make no mental reservations about 2 underestimating Z. Next, 


7-168 — 33-474 34-441 

) E[(fi—n) (f&i—n)’] = 10°{ —33-474 167-347. — a) 

| 34-441 —177-113 198-694 

| and E[(%—n)?] = 20-916 x 108, 
. It is noteworthy that var (%,) and var (75) are each considerably larger than var (%). 
Although 7p is invalid, it is worth evaluating it and its variance to compare with % and 
a var (7%). Using the unbiased version (a + 1) b/(c +1), 
y ftp = 44,927, 
3: and E[(aip—np)*] = 3-181 x 108, 
a 
. The latter is a good deal smaller than var (7%) as might be expected since the j; differ and, 
is even more so, the 7j,;/a;. For 
d 4’ = b’(UD;z'C)-1 U = (9896, — 20,728, 46,020, 15,397). 
e 


It can be shown that 


var ($,) ~ /4/a, + 6,(U®)- UD, D;* U’(U®@)’-" 6, — 2n,/a, 0,(U®)-* US, 








17-2 


6-4. Finally, we consider how far we were justified in § 6-2 in inferring that the ¢; difter. 


(36) 
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where 0; = (9,;) and 8, = (6,,). Using this formula, we find that the estimated standard error 
of ¢, is 0-1509, and therefore the difference between ¢, = 0-9061 and 1 is not significant. 
However, this non-significance may probably be attributed mainly to the insensitivity of the 
estimation. The variances of ¢,, ¢3, ¢, can be found by formulae similar to (36) and a useful 
check on the computation is provided by verifying that they are all the same. 


I wish to thank Professors M. S. Bartlett and D. V. Lindley for their very helpful 
comments. 
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The fiducial method and invariance 


By D. A. 8. FRASER 
University of Torontot 


1. INTRODUCTION 


The fiducial method for estimating the value of a parameter was introduced by Fisher in 
1930. In papers since that time Fisher has frequently discussed aspects of the method and 
may have, in the view of some readers, modified or altered his ideas concerning the under- 
lying principles and the more general aspects of the method. The method has been frequently 
criticized adversely in the literature, particularly in recent years. Some of the grounds for 
this criticism are: conflict with the confidence method in certain problems; non-uniqueness 
in certain problems; disagreement with ‘repeated sampling’ frequency interpretations; and 
a lack of a seemingly-proper relationship with a priori distributions. 

In his recent book, Statistical Methods and Scientific Inference, Fisher (1956) has devoted 
considerable space to the fiducial method. He states that an essential ingredient for its use is 
the absence of prior information concerning the value of the parameter being estimated; in 
his words, ‘it is essential to introduce the absence of knowledge a priori as a distinctive datum 
in order to demonstrate completely the applicability of the fiducial method of reasoning to 
the particular real and experimental cases for which it was developed.’ An interpretation of 
one aspect of this requirement might be that all parameter values are equivalent in the way 
in which the frequency distribution of the observable variable is related to the parameter 
value determining that distribution. In §5 this interpretation is formalized and shown to 
imply a mathematical model in which the parameter is related to a group of transformations 
on the sample space. 

In §§ 2 and 3 the mathematical model involving transformations is presented on its own 
merits and in §§4 and 5 the fiducial argument for it is developed. A consequence for this 
model is that the information about the parameter from an observed value of the variable is 
in the form of a frequency distribution, the fiducial distribution, having a frequency inter- 
pretation in terms of a well-defined kind of repeated sampling. This is in agreement with 
Fisher’s statement—‘the fiducial argument uses the observations (only) to change the 
logical status of the parameter from one in which nothing is known of it, and no probability 
statement can be made, tothe status of a random variable having a well-defined distribution.’ 

Another consequence in the special framework is that there is no need to require the 
absence of an a priori distribution for the parameter. For, if the fiducial distribution is com- 
bined in a logical manner with the a priori distribution the result is the a posteriori distribution 
of a Bayesian argument—a reassuring result. This is demonstrated in § 9. 

A further consequence concerns prior information that the parameter value is restricted 
to some specified range. This restriction can be used to condition the fiducial distribution 
yielding a conditioned fiducial distribution. A probability combination of such restrictions 
can in effect generate an a priori distribution and an appropriate combination of conditioned 
fiducial distributions yields the Bayesian a posteriori distribution. This is discussed in § 10. 


t Present address: Department of Statistics, Stanford University, Stanford, California. 
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In § 11 consideration is given to the case of two separate sets of variables each producing 
a fiducial distribution concerning a particular parameter. A logical method is proposed for 
combining such fiducial distributions to yield a resultant fiducial distribution. The same 
resultant distribution is obtained when the fiducial distribution from one set is used as 
a prior distribution for a Bayesian argument on the second. The same distribution is also 
obtained when the combined system yields a sufficient statistic and thereby a fiducial 
distribution. This equivalence of the results from three quite different approaches is a 
reassuring feature for the use of the fiducial method in the transformation framework 
proposed here. 

David Brillinger in an unpublished paper prepared at Princeton University has proved 
that if a fiducial distribution is used to produce an invariant interval or region, then that 
interval or region is also of confidence type. The interesting cases are, however, those in 
which the interval or region is chosen relevant to the particular observed values of the 
variables and hence in general not of invariant form. The usual fiducial solutions for the 
Behrens—Fisher problem and the Creasy—Fieller problem (specified variances and covari- 
ances) are of this form and for these there is a legitimate frequency interpretation. It is 
suggested then that the usual criticisms of the Behrens—Fisher solution are made within a 
certain mathematical framework and that the fault may well lie in the use of that frame- 
work to evaluate fiducial probability. It is further suggested that the fiducial method for 
problems having the transformation form is not merely better but is the only logically 
justifiable method on the grounds that the scientific method requires the use of all informa- 
tion available. ‘ 

The frequency interpretation of the fiducial distribution has the form of a conditional 
distribution of possible values for the parameter given the observed values of the variables. 
In this form all the familiar methods for handling conditional distributions are available. 


2. THE TRANSFORMATION MODEL 


The term specification is sometimes used to refer to the basic ingredients of a statistical 
problem—the sample space, the parameter space, and the class of probability distributions. 
In this section a general type of specification based on transformavions is presented and 
illustrated by a simple example, sampling from a normal distribution. In a later section this 
type of specification will be related to the requirements that Fisher puts forward for the 
application of the fiducial argument. 

Consider a basic sample space on which there are probability distributions indexed by a 
parameter @ which takes values in a parameter space 2. Suppose this specification admits 
a sufficient statistic x taking values in a derived sample space, 2. The term sufficient statistic 
is used here in the Fisherian sense:} the conditional distribution given the statistic does not 
depend on the parameter 0; no reduction can be made on the statistic (no non-trivial function 
of the statistic can be taken) without losing the preceding property; the dimension of the 
statistic is the same as the dimension of the parameter. A statistic possessing only the first 
two properties is termed exhaustive by Fisher (the term minimal sufficient is often used in 
more mathematically oriented papers). Further assumptions in this section will, in the 
present context, make precise the term dimension. 


+ As formalized from papers such as Fisher (1934, 1948, 1950). 
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Suppose now that there is a group Y of transformations on the sample space that is 
meaningful or reasonable in terms of the physical problem from which the statistical 
problem here considered might have been abstracted. A class Y of transformations is a 
group of transformations if the following conditions are fulfilled: if g and h are transforma- 
tions belonging to Y, then the composite transformation hg obtained by first applying g and 
then applying h belongs to Y; ifg belongs to Y, then the inverse transformation g— exists and 
belongs to Y. 

Also suppose that these transformations relate naturally to the sufficient statistic and 
to the parameter; precisely, suppose the following three properties hold. Firstly, the trans- 
formations on the basic sample space induce transformations on the values of the sufficient 
statistic: if ge Y and x € 2, then the transformation g applied to basic sample points that 
yield a common value x for the sufficient statistic will carry them into basic sample points 
all of which yield a value gx for the sufficient statistic. Thus the transformations can be 
viewed as applying to the space 2. Secondly, there is one and only one transformation 
carrying any point x in Z into any other point 2’ in . Then, by relating some arbitrary 
reference point, say, 2) in 2, to the identity element e in Y, an arbitrary point x in Z can be 
described by means of the unique transformation that produces x from the reference point 
2). Thus in a sense % and Y are isomorphic. Thirdly, a transformation g carries a variable x 
with a @ distribution into a variable gx having a distribution g*0 in Q (that is, within the 
specification) and there is one and only one transformation such that any point @ in Q is 
carried into any other point 0’ in Q. Then, by using a reference point 0, in Q, a general 
parameter value 6 can be described by the unique transformation that produces it from the 
reference point 4). Thus in a sense 2, Y, and Q are isomorphic one to another. 

The simple example of sampling from a normal distribution illustrates the above ideas 
quite well. Let (2,,...,2,,) be a sample from a normal distribution with mean yu and standard 
deviation o with no restrictions on the ranges of u and o. The parameter is 0 = (4, o) and the 
parameter space is (2 = (—00,00) x (0,00), the upper half plane. The combination of the 
sample mean % and the sample standard deviation s is a sufficient statistic in the Fisherian 
sense. Thus the general z is here equal to (%, s) and the sample space % is the upper half plane. 
Notice the identity of the space % and Q. 

For many physical problems yielding this particular specification a linear transformation 
that does not reverse the direction on the axis of measurement has a certain naturalness. 
In fact, such a transformation corresponds to a change of origin or zero for measurement and 
a change of scale or unit for measurement. Usually the origin and unit are a matter of con- 
venticn and in no sense an intrinsic part of the applied problem. 

Let [a, 6] designate a linear transformation that moves the origin by an amount a and 
changes the scale by the positive factor b. This transformation applies on the basic sample 


space as follows [a,b] (xy, ...,%,) = (a+ba,,...,a+b2,). 
The induced transformation on the space of the sufficient statistic is then with simple 
algebra seen to be [a, b] (%, 8) = (a+ bz, bs). 


The class of such transformations 

G = {[a,b]|-w<a<0,0<b< oH} 
is a group; this follows from the closure of Y under the formation of products and inverses, 
the formulas for which are easily seen to be 


[e, d] [a, b] = [c+da, db); [a, by" = [—a/b, 1/6}. 
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There is a unique linear transformation carrying a value of the sufficient statistic (7’, s’) 
into any other value (%",s”). Accordingly, by choice of a reference point, say the particular 
simple choice (0, 1), the transformations can be used as name tags for the points in the space 
2, in fact the transformation [%, s] carries the simple reference point (0, 1) into the general 
point (%,s) yielding the following correspondence between transformations and sample 
_— [Z, 8] - (, 8). 

Under a linear transformation [a, b], a sample from a normal distribution (, o) is carried 
into a sample from the normal distribution (a + bu, bo). Thus, in notation suggested by the 
earlier part of this section, [a, b]* (2,0) = (a +p, bo). 
There is a unique transformation carrying any parameter value into any other; accordingly, 
the transformations can be used as name tags for parameter points and, with reference point 

0, 1), the correspondence is 
01) . [#, 7] (4,0). 
Thus the sample space, the transformation space, and the parameter space are isomorphic. 

A specification satisfying the transformation properties introduced in this section has the 
merit that any sample point has a position relative to any other sample point (as expressed 
by the transformation carrying one into the other) and any parameter point has a position 
relative to any other parameter point. The next two sections will establish that a sample 
point has a position relative to a parameter point and, more important for the fiducial 
approach, a parameter point has a position relative to a sample point. 

Some discussion of transformations and the invariance associated with them may be 
found in Lehmann (1959), Chapter 6. 


3. A MORE GENERAL MODEL 


In problems having merely an exhaustive statistic a natural group of transformations may 
exist and yield the essentials, for present purposes, of the structure in the preceding section. 

Let 0 in Q be a parameter indexing the distributions of an exhaustive statistic 2 in 2. Let 
gina group Y be a typical transformation on a basic sample space yielding a transformation 
on values of the exhaustive statistic x. The group Y of transformations g induces a partition 
of the space % into invariant sets in the following manner. Let S,, be the set of images of x 
under the transformations in 


S, = {a' |x’ = gx for g in J}. 


S,, can reasonably be called the trajectory or orbit of x under the group Y. It is easily proved 
that any S, contains the point x and that any two S’s are either identical or disjoint; the S’s 
thus form a partition of 2. 

Suppose that there is a unique transformation carrying any point 2 in a set S into any 
other point x’ in that S. The transformations can then be used as name tags for points within 
a set S (relevant to some reference point), 

Also suppose that a transformation g carries a variable x with a 6 distribution into a 
variable gx with a distribution g*0 in Q and that there is a unique transformation such that 
any point @ in Q is carried into any other point 6’ in Q in the manner just described. The 
transformations can then be used as name tags for points in Q (relevant to some reference 
point). 
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By taking a very general view of a statistic as a function on a sample space, it is possible to 
view S, as a statistic, that function that produces a set of the partition from a point x in 
#. As such, it has a distribution. In fact, here it has a distribution that does not depend on 
the parameter 0; it has a fixed distribution. Such a statistic with a fixed distribution is called 
an ancillary statistic by Fisher. This property of the statistic S, is easily demonstrated. For 
we note that S, = S,, for all transformations g in Y. If x is treated as a variable with a 
6 distribution, then gz is a variable with a g*6 distribution. The identity of S, and S,, shows 
that the distribution is the same; in other words that S, has the same distribution when 
derived from @ as when derived from g*@, and hence the same regardless of the parameter 
value. 

In such a situation involving an ancillary statistic, Fisher suggests that the problem be 
analyzed conditionally given the ancillary statistic. Some grounds for this may be found in 
Fisher (1948), in Buehler (1959), and in Wallace (1959). A conditional analysis within a set 
S and using transformations Y and parameter space Q reduces the specification to the form 
considered in the preceding section. 

As an example let (x,,...,x,) be a sample from a distribution with known form but 
unknown location parameter . and unknown scale parameter 7. Suppose the distribution 
has a density function. The density function then has the form 


1 .(x— 
aa 
where the function / is specified. 

The variable (x,, ...,7,) can be reduced to the order statistic} {a,, ...,2,} with the condi- 
tional distribution being independent of the parameters. However, except for special 
functions f, no further reduction can be made with the conditional distribution remaining 
independent of the parameters. Thus, in general, the order statistic {x,, ..., x,} is exhaustive. 
For purposes of illustration here it suffices and is simpler to work with (2, ..., z,) rather than 
the trivial reduction, the order statistic. 

As a natural group of transformations for this problem, consider the group Y of linear 
transformations discussed in the preceding section. Under the group Ya point (x,, ...,#,,) can 
be moved parallel to the vector (1,...,1) and radially in and out from that vector. These 
‘motions’ from an initial point generate a set S of the partition of the sample space. An 
element [a,b] of the group Y operates in the following manner. 


[a,b] (a1, ...,%,) = (a+ ba,...,a+bz,), 
[a, 5] (4,0) vig (a+ by, bo). 


The conditions proposed earlier in this section are satisfied by this example. 

The sets S can be easily indexed and so can the points within a set S. Take any location 
and scale statistics; a natural choice is (%, s), the sample mean and standard deviation. The 
statistic (%,s) can be used to describe points within a set S, and on these points the trans- 
formations operate as follows [a,b] (Z, 8) = (a +02, be). 


Take any invariant statistics that complete the description of a sample point; a simple 


choice is ee a al 
A -( ee ..& 
8 8 





+ The set of x values without regard to the order in which they appear in the sample point (2, ..., Z,). 
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which describes the relative positioning of the sample values with respect to (%,s). A 
‘value’ for this statistic determines a set S of the partition. The above statistic is called by 
Fisher a configuration statistic; it is ancillary for this problem. 

The approach suggested earlier in this section is to analyse (%, s) relevant to (u, 7) condi- 
tionally given the configuration statistic. In other words to contemplate only such other 
possible samples as have the same configuration as the particular sample observed. 


4. A PIVOTAL QUANTITY 


Fiducial distributions are usually derived by means of a pivotal quantity. A pivotal 
quantity for a specification is a function of the sufficient statistic and the parameter that 
has a fixed distribution regardless of the value of the parameter. A simple example is the 
t-quantity for sampling from a normal distribution, t = ,/n(%—)/s; it is a function of the 
sufficient statistic (%, s) and has a fixed distribution, Student’s with n — 1 degrees of freedom. 
In the presence of an ancillary statistic, the usefulness of a pivotal quantity for fiducial 
purposes requires that it have a fixed conditional distribution given the ancillary statistic. 

In the statistical literature there are examples of specifications that have yielded several 
essentially different pivotal quantities with the result that several fiducial distributions for 
the parameter have been obtained. Some prominent examples can be found in Mauldon 
(1955) and Tukey (1957). In this section a pivotal quantity will be developed for the 
specification described in § 2 and will be shown to be unique in the following sense: any other 
pivotal quantity invariant in form under the transformations in the group is a function of 
the pivotal quantity developed here. A consequence of this result is that if the principle is 
accepted that estimation procedures should be invariant with respect to natural trans- 
formations leaving the specification invariant then the fiducial distribution will be uniquely 
determined. For the Mauldon example there are groups that fulfil the requirements in § 2. 
If one of these groups is added to the specification as being a natural ingredient of the 
problem, then with reference to the principle just mentioned the fiducial distribution is 
uniquely determined. 

The development here will be with reference to the model in § 2. The model in § 3 reduces 
to the simpler model if the sample space is restricted to that given conditionally by the 
ancillary statistic. 

Let 2, 9) be arbitrary but fixed reference points in the sample space Z and the parameter 
space Q. Elements of the group Y can then be used to describe sample and parameter points. 
Let g, be the unique element of Y that carries x, into a general sample point x, and similarly 
let hy be the element that carries 0, into a general parameter point 0. 

As a transformation on Q, h, carries 0, into 0. Therefore, as a transformation on 2, hy must 
carry a variable with a 0, distribution into a variable with a 0 distribution. The inverse hj? 
then carries a variable with a @ distribution into a variable with a 0, distribution. 

Consider now x = 9,2, as a variable with a @ distribution. Applying the transformation 
hg* produces a variable hj 19,2) with a 9, distribution. This variable is a function of # and 
x and it has a fixed distribution regardless of the value of the parameter. It is generated by 
the random variable hj 1g, treated as a random transformation and applied to the fixed 
reference point 2p. 

Thus g = hj1g, is a function of x and 0 taking values in Y and having a fixed distribution 
when z is treated as a variable with a 0 distribution; it is a pivotal quantity. Asa function of 
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x and 6 it is invariant under transformations in Y. For, let f be a transformation in ¥; 
under f, x = g,%, is transformed into fx = fg,x), and a distribution 0 = h,@, for x becomes 
the distribution f0 = fh,0, for fx. Hence 

he"9, eid (fh) (f9z)- 

Any function of x and 6 that is invariant with respect to G is now shown to be a function of the 
pivotal quantity hg4g,. Let P(x; 0) be invariant with respect to Y: P( fx; f0) = P(x; 0) for f in 
4. Using this invariance yields 

P(x; 8) = P(9z%y; hoo) 
= P(he*9z%o; he *he) 
= P(hi19_%93 %)- 
Thus P(x; 6) has been expressed as a function of hj1g,. From this it follows that any 
invariant pivotal quantity is a function of the particular pivotal quantity hj1g,. In its 
essentials then in terms of its usefulness, the pivotal quantity hj 1g, is unique—any other 
pivotal quantity is a reduction from it. 
Consider the example involving sampling from a normal distribution. Using (0, 1) as the 
reference point in both the sample and parameter spaces produces 
Ix = [%, 8], ho = [4,0]. 
The pivotal quantity then has the form 


- ho '9z = [u, 0} [%, 8] 
#1) 
= |-5-5| [%, 8] 


fixe ‘|; 
7 See 


it is the essentially unique invariant pivotal quantity. Normal distribution theory shows 
that its distribution can be described by the pivotal variable 





ae 
= Fe oD) 
where z is a standardized normal variable and x is an independent y-variable on n—1 
degrees of freedom. 

The distribution of the pivotal variable g can also be obtained by a direct argument using 
the general theory. g as a variable in Y must carry the fixed point x) = (0, 1) into a variable 
having a4, = (0, 1) distribution, that is, g(0, 1) = {z/,/n, x/,/(n — 1)}. From this it follows that 
9 = [2lVn, x/y(n—1)]. 

The general discussion earlier in this section showed that the frequency distribution for 
x produced a fixed frequency distribution for the pivotal variable g. What is perhaps of even 
greater interest is that this fixed distribution of the pivotal variable when used in conjunction 
with the pivotal equation g = hz 1g, completely describes the specification of the problem. For, 
let g be a variable having the fixed pivotal distribution. The equation 


= he "Ja 
can be solved yielding Ix = hog, 


% = 9,%q = hggxo, 
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which demonstrates that the general frequency distribution for x is obtained by a parameter 
transformation (h,) applied to gx; ga, is a variable on the sample space % obtained by a 
random ‘error’ transformation g applied to the reference point x). The general distribution 
is thus in a sense merely a relocation of an ‘error’ distribution about an arbitrary reference 
point 2%». 

5. THE FIDUCIAL DISTRIBUTION 

A fiducial distribution purports to be a frequency distribution of possible values for a 
parameter as obtained, through frequency information summarized in the specification, 
from an observed value of a variable. The technique for deriving a fiducial distribution is 
the following: the observed value of the sufficient statistic is substituted into the pivotal 
equation; the pivotal variable is treated as a variable with the frequency distribution 
normally associated with it; the parameter in the equation is treated as a free variable and 
the distribution of the pivotal variable is transferred to it by the pivotal equation. 

Let g be a variable with the fixed pivotal variable distribution; let x be the observed 
value of the sufficient statistic; and let 9 be a variable designating possible values for the 
parameter in the light of the frequency and observational information. The pivotal equation 
can be solved} for 6 in terms of g and x 


gj= he 92 
ho = 92971, 
0 = 9,970. 


This last equation gives the fiducial distribution for 9 as obtained from the fixed distribution 
of the pivotal variable. 

In an application of this theory the parameter @ has a particular but unknown value. This 
value determines the distribution of the sufficient statistic which in turn produces the 
observed value x. In such a situation @ is certainly fixed but this does not meant that 
probability statements cannot be made concerning 0. 

In §4 it was shown that the frequency information available concerning the sufficient 
statistic relative to the parameter can be summarized in the fixed distribution of the pivotal 
variable. In terms of repeated sampling from this fixed distribution for the ‘error’ variable g, 
there is generated a frequency distribution 9 of possible parameter values corresponding to 
the observed x. This is the fiducial distribution and it has here a frequency interpretation. 
This interpretation will be elaborated upon in terms of the example involving sampling from 
a normal distribution; the interpretation, however, applies generally. 

Consider again the example. The pivotal variable was obtained in § 4; its distribution can 


be expressed by g = [z//n, x/./(n —1)]. 


A frequency distribution is generated when this random ‘error’ transformation is applied to 
the reference point (0, 1) and the frequency distribution of the observable variable is then 
obtained by a transformation on the sample space where the transformation is determined 
by the parameter. 


+ The preceding section contains the solution for # in terms of g and 6; it describes the specification 
directly. 

{ Consider a gambling game in which the dice are rolled but concealed from view; the bets made; 
and the dice exposed. With honesty this game is equivalent to one in which the bets are made before 
the dice are rolled. Even if a person contemplated only one play I doubt that valid grounds could be 
given for preferring one to the other. 
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The fiducial method and invariance 

The formulas earlier in this section produce the fiducial distribution 
g~*(0, 1) = [z/y/n, x/J(m— 1)] (0, 1) 
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where the t-variable implicitly defined is statistically dependent on the y variable. The 
fiducial variables 7, & are obtained from the observed values %, s by the action of the pivotal 
variables z, y. It is interesting to note that the fiducial distribution of 7 is centred at 2, is 
scaled by s/,/n, and has the form of Student’s distribution on  — 1 degrees of freedom. 

Suppose a performance of an experiment involving a sample of n from a normal distribu- 
tion has yielded the observed values %,s. Now contemplate many past and possible per- 
formances of experiments involving samples of n from normal distributions. For most such 
performances the origin and scaling is purely conventional. To make these other per- 
formances comparable to the particular performance in hand, transform each by rescaling 
and relocating so that the same mean in each case is moved to the value % and the standard 
deviation to the value s. In each case apply conceptually the transformation to the mean of 
the distribution to yield a value that is suitable for comparison with the values Z,s. The 
collection of these transformed means generates a frequency distribution and it is fairly 
easily seen that this frequency distribution is the fiducial t-distribution obtained in the 
preceding paragraph. From this point of view then the fiducial distribution is a frequency 
distribution of possible values for the parameter relevant to the particular observed Z, s. 
It seems quite proper then to use this distribution to make probability statements in which 
pt, o appear formally as variables. Such probability statement would be consistent with 
Fisher’s operational definition of probability; see, for example, Fisher (1959). 

The interpretation of the fiducial distribution for the general case proceeds in a similar 
manner. The mathematics associated with it takes the following form. Let x be an observed 
value of the sufficient statistic. Let X be the observable sufficient statistic with distribution 
given by X = hte 
where the randomness is produced by the ‘error’ variable g, h, is the parameter transforma- 
tion, and 2, is the reference point. @ need not be fixed; but a value for @ and an observed 
value for g produce an observed value for X. 

The transformation that carries X into x is g,g~hj?, as is easily checked, 


929 the *X = 9,9 hg thegry 


= Jz% = x. 
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Applying this transformation to @ produces 


929 hz 0 = 9,970, 


which is a variable with distribution generated by the ‘error’ variable g. This frequency 
distribution of possible values for 6 is just the fiducial distribution that was developed 
earlier in this section. 

With the preceding interpretation for fiducial probability it seems quite proper to refer 
to it as an a posteriori distribution obtained in the absence of an a priori distribution. 


6. RELATIONSHIP TO FISHER’S REQUIREMENTS 


Throughout his writings on fiducial inference, Fisher has emphasized that the method be 
based on an exhaustive statistic, in fact, that it be based either on a sufficient statistic or on 
an exhaustive statistic that has an ancillary so that with respect to the ancillary the 
exhaustive statistic is conditionally sufficient. This is the basic assumption for the specifica- 
tions in §§ 2 and 3. 

Near the beginning of §1 there is a quotation from Fisher describing an ‘essential 
ingredient’ for the use of the fiducial method. Following this, there appears my inter- 
pretation of one aspect of this ingredient—‘that all parameter values are equivalent in the 
way in which the frequency distribution of the observable variable is related to the para- 
meter value determining that distribution’. Implicitly then, there must be a means for 
comparing the frequency distribution for one value of # with the frequency distribution for 
another value of 0. The only reasonable we v of making such comparisons, seemingly, is by 
means of transformations on the sample space: 

Assumption 1. There is a group G = {g} of 1-1 measurable transformations on the sample 
space & of the sufficient statistic onto itself. 

That the set of transformations be a group follows from the reasonable requirement that 
the set of transformations be closed under the formation of products and inverses. 

The purpose in having transformations is to permit comparison of various distributions for 
the sufficient statistic x. A transformation on the sample space must then carry one possible 
distribution for the sufficient statistic into another possible distribution for it: 

Assumption 2. The class, indexed by 0 in Q, of probability distributions for x is closed under 
the transformations in Y. 

If g is a transformation in Y and ~ has a distribution determined by @ then the variable 
gx has a distribution on the sample space Z and by the assumption there is a parameter value 
for it: g*@. The transformation g* on Q describes the changes in parameter values corre- 
sponding to the changes in distribution determined by g. The transformations Y* = {g*} 
form a group and the mapping g > g* isa homomorphism; see, for example, Lehmann (1959). 

If the comparison of possible distributions for 2 is simple and direct, there will be a single 
transformation that carries any one parameter value into any other parameter value: 

Assumption 3. For any 0,6’ in Q there is a single g in GY such that g*0 = 6’. In terms of 
group-theory properties of Y and Y*, this can be expressed alternatively as follows: the 
mapping g > g* is an isomorphism (that is, a 1-1 correspondence and this obtains if the 
identity transformation is the only transformation that does not change the distribution of 
the variable x); and the group Y* is exactly transitive on Q (that is, for any 0,6’ there is a 
single g* in Q carrying 0 into 0’). 
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A sufficient statistic in Fisher’s sense is an exhaustive statistic having the same dimen- 
sions as the parameter. Assumption 3 by simple analysis establishes isomorphisms between 
G, G*, Q. The following assumption extends the isomorphisms to % and in this context gives 
a precise meaning to ‘same dimensions’: 

Assumption 4. GY is exactly transitive on 2. 

These assumptions produce the structure described in § 2, and consequently the fiducial 
development in §§ 4 and 5. The assumptions are based on the use of a sufficient statistic and 
on my interpretation of one aspect of Fisher’s essential ingredient. In this framework the 
fiducial development and interpretation in the preceding sections, and the fiducial properties 
in succeeding sections are remarkably coherent and consistent, and to me lend strong support 
to the pre-eminent position in which Fisher places fiducial inference. 


7. INVARIANT MEASURES 


The specifications in the preceding sections have been described vaguely in terms of 
‘frequency distributions’ or ‘probability distributions’. For most problems of practical 
interest it is possible to describe frequency distributions by means of density functions with 
respect to a fixed or ‘carrying’ measure on the sample space. Here, with transformations on 
the space, the carrying measure would be transformed into other carrying measures on the 
space. Also, with inverse transformations belonging to the group, it follows that these 
different carrying measures would be absolutely continuous one with respect of another (one 
expressable by means of a density function with respect to another; see, for example, Halmos 
(1950)). For simplicity of representation this suggests the use of a measure whose value is 
invariant when a transformation in a group is applied to a set in the sample space. Measure 
theory (for example, Halmos (1950)) establishes the existence of such an invariant measure 
in a quite general setting. 

The use of density functions also facilitates the argument in succeeding sections which are 
concerned with prior distributions and with the combination of fiducial distributions. 

A measure on a group Y is said (Halmos (1950), Loomis (1953)) to be a left invariant 
measure or a left Haar measure if it satisfies 


H(G) = n(gG@) 


for all elements g in Y and all Borel sets G contained in Y; g@ is the set obtained when each 
element in G is multiplied on the left by g. Such a measure exists if the group satisfies: 

Assumption 5. The group Y has a locally compact metric topology and the group operations 
are continuous with respect to the metric. 

Measure theory establishes the essential uniqueness of left Haar measures, uniqueness in 
the sense that for any two left Haar measures one can be expressed as a constant times the 
other. 

Let G- be the set of elements g—! corresponding to elements g in G. Also, let v be a measure 
on the group Y defined by WG) = n(G>, 


where y is a left Haar measure. It is easily shown that v has the property 
v(@) = v(Gg) 


for all g in Y and all Borel sets G contained in Y. By analogy such a measure is called a 
right invariant or right Haar measure. 
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Let 4, be defined by 


L(G) = 2(G9); 


/4, is easily shown to be a measure. It satisfies the property of left invariance and hence by 
the uniqueness theorem must be a constant (which can of course depend on the g entering its 
definition) times the measure 

) M W(G9) = Ag) n(G). 


This positive real valued function A(g) defined on @ is called the modular function of the 
group Y. Simple analysis establishes the following properties of the modular function 


A(gh) = Aig) A(h), Ag) = 1/AQg), 
v(g@) = Aig!) o(@), A(e) 1, 


where ¢ is the identity element in the group. 

The prime importance of the modular function is that it gives the relationship between 
left and right Haar measures—either can be expressed by means of a density function with 
respect to the other 


ma) =| duig) = [a dv(g), 


»(@) = [av ‘ [ Awe) 


or more briefly du = Adv, dv = A-'du. 

The example in § 2 was discussed in terms of location and scale transformations. These 
transformations form a group with elements [a, b], where —co < a < 0, 0 < b < o. The left 
Haar measure for this group has element 

dadb | 
e 
the invariance of this is easily checked. (The Jacobian of the linear transformation 
[A, B] = [c,d] [a, 6] 
is d?; also B* = d*b?.) The right Haar measure element is 


dadb 
b 
and the modular function is A([a, 6]) = 1/0. 





8. THE SPECIFICATION IN TERMS OF DENSITY FUNCTIONS 
In this section a density function is assumed for the pivotal variable g and is used to 
produce expressions for the density functions of x given 0 and of 0 given z. 
In § 2 it was shown that the sample space % could be identified with the group Y by taking 
a reference point x, in 2. The identification then has the form 


Youd! Be) 


where g, is the group element satisfying x = g,2,. Thus, in effect the sample space has the 
structure of the group Y. A simplification of notation is then obtained if z is used to designate 
the group element g,. In a similar manner the parameter values can be replaced by group 
elements 0, > e, 0 > h, and the notation then simplified by using 6 to designate the group 
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element h,. Thus the sufficient statistic x, the transformation g, the parameter 0 range over 
the spaces 2°, Y, Q which all have the structure of the group Y. 
In this simplified notation the pivotal equation is 


g= 02 
and its solutions are x = 0g; 0 = xg. 

Suppose now that the sufficient statistic x has a distribution expressible by means of a 
density function with respect to left Haar measure. The distribution of the pivotal variable 
g is in fact the distribution of x for the case in which the parameter 6 = e; let p(g) be the 
probability density function in this case: 


Pr {g in G} = [ po du(g). 
The general distribution of x can then be obtained from the pivotal equation x = 0g. 
Pr {x in X|6} = Pr {0g in X} 
= Pr {g in 0X} 


=, P(g) du(g) f 
6-°X 
‘“ i) (6-1) du(6-12) 

rin X 


= | p(O- 2) du(x), 
=z 


the last step following from the left invariant properties of 1. The essentials of this derivation 
can be exhibited in formal manipulation of probability elements. The variable x is generated 
from the variable g by the pivotal equation; the probability element is 


P(9) Aug) = p(x) du(Ox) = p(O*x) du(z). 
Thus the density function for x given 0 is p(@—!x) with respect to left Haar measure. 
The pivotal equation solved for 0 is 0 = xg-. The fiducial distribution for 0 is generated 
by this equation from the variable g; x is kept fixed 


Pr {0 in ©|x} = Pr {ag-! in 0} 
= Pr {g in 0-12} 


=| P(g) dug) 

O-'z 

» | p(0-x) du(0-2) 
éind 

- [ 22) 4e@) du(0-) 


-| p(O-1x) A(x) dv(8). 
C) 
The essentials of this derivation can be exhibited in formal manipulation of probability 


— Pg) du(g) = p(O-*x) du(O-1x) = p(O-*x) A(w) dv(0). 


Thus the density function for 6 given x is essentially p(@—1x) with respect to right Haar 
measure; A(x) is a normalizing constant to make the total integral equal to 1. 


18 Biom. 48 
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In a sense it is natural to use right Haar measure as a basis for 0 and left Haar measure as 
a basis for g and for x. For, let G be a set of values for g and define X and © by 


G=060-X for @ fixed 
=@-!z for 2 fixed. 
Then H(G) = w(O*X) = p(X) 
= w(O-1z) = v(x) = A(x) v(O). 


Thus left Haar measure for x and g relates naturally to right Haar measure for 0. 


9. COMBINING FIDUCIAL AND PRIOR DISTRIBUTIONS 


Consider a specification of the kind in §2 and suppose that the specification admits 
description in terms of density functions as in the preceding section. Also suppose that prior 
information concerning the parameter can be summarized in terms of an a priori frequency 
distribution with probability measure N(@), and that this admits a density function n(@) 
with respect to the ‘natural’ right Haar measure. The a priori probability element for 0 is 


then given by n(0) dv(6). 
The fiducial probability element for possible values of 0 given z is 
p(O-1x) A(x) dv(8). 
The joint probability element for 0 and 8 on Q x Qis 
p(O-1x) A(a) n(0) dv(0) dv(8). 


But 0 and @ need to be identified, since the prior distribution produced the 0 that yielded the 
x observed; taking the conditional distribution along 0 = 6 with respect to the base measure 
v yields the following relative density for 0 given x 


p(Ox) n(A) 
which when normalized yields the probability element 
_ p(O-*x) n(8) dv(9) _ 
[ pee) me) av(@) 


It is interesting to note that the above ‘logical’ combination of fiducial and a priori 
distributions is just the ordinary a posteriori distribution for 0. The joint probability element 


for x and @ is p(6-4x) n(8) d(x) dv() 

which yields the following a posteriori probability element for 0 given x 
_ p(O-*x) n(8) dv(A) 
[ rig) mG) d01g) 





which is the expression just derived by ‘logical’ combination. 

Fisher’s ‘essential ingredient’ in its entirety requires the absence of all prior information. 
In the present framework based on one aspect of Fisher’s ‘ingredient’ there is no need to 
require the absence of an a priori distribution, since it can be combined with the fiducial 
information in a logical manner and be completely consistent with a Bayesian analysis. 
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10. CONDITIONING FIDUCIAL DISTRIBUTIONS 


Suppose that the information available before an observation is taken on the sufficient 
statistic x is such as to restrict the range of possible values of the parameter to some subset 
Q, of Q. Fisher’s essential ingredient, interpreted strictly, would then say that the fiducial 
argument was not applicable. Fisher himself, however, in an example in Statistical Methods 
and Scientific Inference (Fisher, 1956), has derived a fiducial distribution in such a situation. 
Within the framework introduced in this paper, a restriction on the range of the parameter 
seems entirely consistent with the use of the fiducial method. 

The fiducial distribution for 0 given x can be described by 6 = g,.g-10,, where g has the 
pivotal distribution. In the notation of Section 8 this produces the fiducial probability 


element (6-1) dv(0), 

where p is right Haar measure on Y. The argument in § 5 gave a frequency interpretation to 
this fiducial probability ; it seems natural then to impose the condition ‘@ in Q,’ as a condition 
on the fiducial distribution to obtain a conditioned fiducial distribution with element 


$a,.(9) p(O*x) dv(A) 


? 


$a,(7) 91x) dv(y) 


where $0,(4) = 1,0 according as @ belongs to, does not belong to, Qo. This has a frequency 
interpretation by means of a straightforward restriction to Q, of the frequency interpretation 
in the unrestricted case. 

In the example on page 133 of Statistical Methods and Scientific Inference, Fisher has 
proceeded differently and condensed the fiducial probability outside the permissible region 
to a boundary point. Fisher gives no justification for his procedure. 


11. CoMBINING FIDUCIAL DISTRIBUTIONS 


Consider two observational systems concerned with a single parameter 0, and suppose 
that the specifications have the form described in §8. For one of the systems let x be the 
sufficient statistic, let g be the pivotal variable with density function p(g), and let 8, be the 
fiducial variable derived from an observation on x. Similarly, for the second system let y be 
the sufficient statistic, let h be the pivotal variable with density function q(h), and let 6, be 
the fiducial variable derived from an observation on y. In this section several methods are 
considered for combining the observational information; reassuringly, they yield identical 
results. 

As a first method, consider the direct combination of the fiducial distributions. The joint 
probability element for 6, and 0, is 

p(y *x) (Bz *y) A(x) Aly) dv(4,) dv(8,). 
The available prior information states that the parameter values in the two systems are 
equal. Formalizing this in the condition 6, = 8, and imposing this condition by reference to 


the natural right Haar measure on Q produces the following ‘fiducial’ probability element 
for @ based on the combined systems 


p(O-*x) (Oy) dv(8) 
[ eer) eran 
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As a second method, consider the use of the fiducial distribution from one system as 
a prior distribution for a Bayesian argument on the other system. The fiducial density 


function from the first system is 
p(x) A(z) 


with respect to right Haar measure. Using this as a prior density n(@) for a Bayesian argu- 
ment on the second system (see §9) produces an a posteriori probability element for 0 


P(x) (Oy) d(9) 
| pos) q(n-ty) dv(y) 





This is the result obtained by the first method. 
For a third method, suppose that the combination of the two observational systems 
admits a sufficient statistic r(x, y). The joint density function for x, y can then be factored 


P(O-*x) q(O-*y) = f (r(x, y), 9) w(x, y). 
Let k be some value of the statistic r(x, y) and let t(x, y) be the solution of 
r(t-1z, ty) = k; 


moderate regularity conditions will insure a unique solution. The definition of the function 
t(x, y) shows that ¢(d—1z, d-1y) is a solution of 


r(t—1d—1z, t-1d—ly) = k; 
direct substitution shows that dt(x, y) is also a solution of the equation. Hence 
t(d—1x, d-ly) = d“t(x, y) 


and ¢(x, y) is transformed in the same way as a group element. The joint density function can 
be manipulated as follows 


p(O-*x) g(O-*y) = p((t-10)* 4a) q((t10) > 1) 


= f (r(t-tx, ty), 10) g(t x, 7y). 
Substituting ¢ = t(x, y) produces 


p(O-*x) q(O-1y) = f (k, ta, y) 0) w(t (a, y) x, (x, y) y) 
= f*(Ot(x, y)) w*(x,y), 


where f* is implicitly defined by the last step. This proves that t(«, y) is a sufficient statistic. 
In addition, ¢(xz,y) takes values in the group Y and transforms simply: t(gx, gy) = gt(x, y). 
The last equation also establishes that w*(x, y) is invariant: w*(gx, gy) = w*(x, y). 

Simple analysis involving the last displayed equation, the invariance of w*(zx, y), and the 
transformation properties of t(x,y) show that the density function for ¢ is Kf*(9—t) with 
respect to left Haar measure (K is a normalizing constant). The corresponding fiducial 
probability element for 0 given t is then 


kf *(O—t) A(t) dv(8). 
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The fiducial probability element obtained by the first two methods can be simplified in the 
face of the assumptions ana notation used for this third method 


p(x) q(0-1y) dv) f #041, y)) w* (ew, y) dv(0) 
[ prea yan me [ Frere, wy) we, v) dota) 








__ f%(O-2(x, y)) Alta, y)) do(0) 
[ frre, y)) A(t, y)) dv() 


= kf*(Ot(x, y)) A(t(a, y)) dv(8). 





Thus the first two methods yield the same result as the third in problems in which the third 
method is applicable. 

In the third method the two observational systems are put together to form a single com- 
posite system and analysed under the assumption that the composite system admits a 
sufficient statistic. For a fourth method of analysis, suppose that the composite system does 
not admit a sufficient statistic. The probability element is 


p(O-*x) q(O-*y) du(x) duly) 


onthe productspace % x Y. This model has the more general form discussed in §3; anancillary 


statistic is 4 
a= aly 


and a conditionally sufficient statistic is 
t=. 


The statistic a = ay = (fx)-! fyis invariant under transformations in the group and hence 
has a fixed distribution—is ancillary. The statistic ¢ on the other hand transforms as a group 
element under transformations in the group. 

The carrying measure of a set A x T' of values (a, t) is 


MAxT)=[_ du(eyduy) 
z in 
M(AxgT) = i dyu(ze) dply) 
a~'yin A 
x ingT 


=f. mon) dntau) 
a~tyin A 
in 7 


=f dace daty) 
a—yin A 
xz inT 


= M(AxT). 


Thus the carrying measure as it applies to sets of values for ¢, given values for a, is left 
invariant and hence is a constant times the left Haar measure y 


M(AxT) = Kyu(T) 
for A fixed. 
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Making the change of variable in the original density function and using the conditional 
form of the carrying measure gives the following expression for the conditional probability 
element of ¢ given a Kp(0-4) q(0—ta) du(t). 

The fiducial distribution of 0 given ¢ (and a) is then 
Kp(0-t) q(0“ta) A(t) dv(A) 
and this is the result obtained by the first two methods. 

Four methods have been considered for obtaining fiducial distributions from separate 
systems concerned with the same parameter. The resulting distributions are identical, a very 
reassuring internal-consistency property of fiducial analysis in a transformation framework. 
Consider separate observational systems concerned with a single parameter, and suppose 
that each has the transformation form of §§ 2 or 3. The first two methods, consistent one with 
the other, give a ‘reasonable’ way of obtaining an overall fiducial distribution and the result 
is the same as obtained by the fourth method. Thus in effect the ancillary statistic for the 
overall system is generated by the ‘reasonable’ method of combining the individual fiducial 
distributions. In such situations then the ancillary statistic is not as arbitrary a thing as 
might be first thought, but is an intrinsic part of the model for purposes of inference. 

If x and @ are real valued and if the assumptions in § 2 are fulfilled then the problem is of 
location parameter form. If the distributions are discrete this result follows simply. If the 
distributions are continuous the result follows from Theorem 2.9.3 in Cohn (1957). 

Under mild regularity conditions Lindley (1958) has examined the consistency of methods 
2 and 3 for combining fiducial distributions. In the case that the variables correspond to 
different observations from the same distribution having a real variable and real parameter, 
he has shown that the consistency is obtained if and only if the parameter is essentially a 
location parameter—that is, that the group structure herein considered is present for the 
real variables. 

Consider again the location and scale parameter example mentioned in §3: x,,...,%, isa 
sample from a distribution with density 


(38 


on the real line; f is specified. The density function for a sample of n is 


For a fiducial analysis based on §3 the conditional distribution of (%, s) given the ancillary 
statistic A is needed and this is obtained directly from the joint density of %,s,A. The 
Jacobian of the transformation from (z,, ...,x,,) to (%, 8, A) depends on (%, s) only in the form 
of a factor s"-?. Thus the conditional probability element for Z, s given A, uw, o has the form 

K ® .(t:—-2)\ .,-9 34048 
me Ts oc )s 4 st’ 


where the measure element has been adjusted to exhibit left Haar measure. The fiducial 
method of § 5 together with the measure results of §§ 7 and 8 then give the following fiducial 
probability element for 1, o given %,s, A: 
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If n = 2 no ancillary statistic is needed and the pair of variables is a sufficient statistic. 
In this simple case the fiducial distribution for 1,0 can be derived without the seeming 
arbitrariness of an ancillary statistic. 

For n greater than 2 (and for present simplicity suppose it is even) the set of variables can 
be broken into pairs and a fiducial distribution from each obtained. The general result of this 
section then demonstrates that an overall fiducial distribution obtained by direct combina- 
tion (§11, method one) or by a Bayesian argument (§ 11, method two) will be just the fiducial 
distribution obtained here by the ancillary argument. This illustrates the intrinsic nature of 
the ancillary statistic for problems in the transformation framework, and gives an alternative 
demonstration of the essential uniqueness of the ancillary statistic for this example. 

The discussion in this section has been restricted to specifications that satisfy the trans- 
formation conditions in §§ 2 and 3. The fiducial method has been applied to other kinds of 
specifications and Quenouille (1958) in his book, The Fundamentals of Statistical Reasoning, 
has discussed methods for combining fiducial distributions. 


12. THe BEHRENS-FISHER PROBLEM 


Let 2, ...,%,, be asample from a normal distribution (,, 72), and let y,, ..., y,, bea sample 
from a normal distribution (,.,03). The Behrens—Fisher problem is concerned with the 
estimation of the parameter difference 2, — /1o. 

The information concerning the frequency distributions relates the x’s to (,, 0?) and the 
y’s to (fg, 7%). The transformation analysis in §5 shows that this information is such that, 
given the observed values %, s2, the knowledge concerning /, is that of a variable described by 


Z+t,s8,/,/m, 


where ¢, has Student’s distribution with m — 1 degrees of freedom. Similarly, the information 
concerning //, is that of a variable described by 


y - te 8,/ Jn, 
where f, is a statistically independent variable with Student’s distribution on n — 1 degrees 


of freedom. These distributions are appropriate to the observed values Z,8,,9,8,. The 
frequency distribution of possible values for 4, — “4, is then described by 


P= 8, 8 

¥-Yt+ (7¢,-4¢) ; 
The distribution of this fiducial variable was derived by Fisher and percentage points for it 
can be obtained from tables prepared by Sukhatme. 

This fiducial distribution for “,—. has a frequency interpretation by means of the 
argument developed in § 5, and is relative to the fixed observed values %, 7, 8,,8,. It need 
not be and is not in general true that for a fixed value of 1, — 4g, a 95 % fiducial intervai will 
cover the value of #, — 4, with frequency 95 %. Rather, for fixed Z, 7, s,, 8, the prior frequency 
information relating parameter to variable as it is given freedom in a transformation frame- 
work gives a frequency distribution of possible parameter values yielding the just-derived 
marginal distribution for “,—/,. Some discussion on such a broader use of probability in 
science as opposed to technology may be found in Fisher (1955, 1959). 

A fiducial interval for 1, — 4, will not be an invariant interval with respect to the trans- 
formations for the z’s and for the y’s. In fact, under separate linear transformations for the 
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x’s and for the y’s, the parameter ,—/. itself is not transformed but is ‘pulled apart’. 
A non-trivial invariant interval does not exist. However, the invariance was used only to 
establish the frequency distributions and in consequence taking a marginal distribution for 
[44 — #2 need not be related to the invariance. 


The referee who examined the original version of this paper offered many suggestions on 
arrangements and on detail which were very valuable to me in preparing the revision. 
I appreciate his help. 

Prof. Lindley, as an editorial adviser for Biometrika, examined the paper in detail, made 
many valuable suggestions for reorganizing the material and thereby helped me towards the 
present version which I feel is a major improvement over the original. I thank him with 
deep appreciation. 
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Bayesian sequential analysis 


By G. B. WETHERILL 
Birkbeck College 


1. DISTRIBUTIONS CLOSED UNDER SAMPLING 


Suppose we have batches of items presented for acceptance inspection, and each batch 
is to be sentenced independently by a sequential procedure. Let the result of inspection of 
an item be a random variable x with a density function $(x|4), where 0 = (0,, 0s, ...,9,). 
In the simplest case where items are classified as effective or defective, 6(x|9) is a binomial 
distribution with probability of a defective equal to 0. 

In acceptance inspection, the prior knowledge is represented by the process curve, which 
is a distribution of @ specifying the relative frequency with which batches are produced 
with quality 0. 

In general let the prior knowledge be represented ky a density function which we shall 
call a parameter distribution, denoted £(@|«), depending on a known constant « (where a 
may be a vector). For convenience throughout §§1 to 3 of the paper, equations will be 
formulated for continuous density functions £(0|«) and 4(x|@). A fully rigorous discussion 
of the theory will not be attempted. 

After observations x, 2, ...,Z,, have been taken the posterior distribution of 0 is 


__ (Bat). (e118). Bl 2|9) --- F@n|9) 
{ E(B|az). Blr4|0).(29|0) ... $(:r4|0)d0 





Suppose now that this posterior distribution is of the same functional form as £(A|«), 
say £(0|f£). Our total knowledge at any point is then represented as a distribution of this 
type. We shall call a distribution possessing this property a distribution closed under 
sampling, with respect to observations having a distribution $(|@). 


Formal definition (due to Barnard)* 


A parametric family £(0|«) is said to be closed under sampling with respect to observations 
having a density function ¢(z|@), if for all x,,2»,...,%,, for which ¢(x,|6) + 0, there exists 
a £ such that 

£(0|2) =- E(O |). P(1|4)- P(x2|9) .-- B(en|A) (1) 
f (012). $6e10)-0(e910) .. 6(aql a0 


where f = f(a, 2,22, ...,%,). To make this clear we shall consider one or two examples. 

(a) Consider inspection of batches by attributes, and suppose that items are of two kinds, 
effective and defective. Suppose that the probability of a defective in each batch is p, and 
that p varies from batch to batch according to the Beta distribution kp%(1—p)'. This is our 


* Dr M. Stone has independently studied the idea of distributions closed under sampling in an 
unpublished Ph.D. thesis. 
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parameter density function representing our prior knowledge. The posterior distribution 
after observing r defectives in n items is 


a kp*q'.p'q?* _—* k'prtegrrtt, 


ms Oo | 
k | pg .p'q"* dp 
0 


which is also a Beta distribution. The family &(p|f) is therefore the class of all Beta dis- 
tributions kp/:q’2, where £ = {f,,/,}, and £, = r+s, fp =n—r+t. 

Thus a Beta distribution is closed under sampling with respect to binomially distributed 
observations. The total knowledge at any point in the sampling can be represented in a 
two-dimensional space with co-ordinates (/,, £,). 

(b) Suppose that our observations are distributed according to any probability distribu- 
tion $(x|0), where 0 varies from batch to batch, but may only take certain given values 


where q=1-p, 





k 
0; (¢ = 1,2,...,k). Our prior probability is then (a,, ag, ...,a;,), where ©) a; = 1. The posterior 
1 
distribution is of the same form (0, bg, ..., b;,), where 
a; II $(x;|0,) 
b=>5— — and 
Xa; IT A(x;|4;) 
i=1 jf 


a 
Il 


The set of values 9 does not change, and sampling merely alters the set of probabilities 
a,;. Therefore a prior distribution, which takes its values at a finite set of points 0, is closed 
under sampling with respect to observations distributed according to any probability 
function ¢(x|@). The family £(4|«) is the class of all discrete distributions where the random 
variable 0 may only take on the values 0; (i = 1, 2,...,k). 

(c) If our observations are distributed according to the Poisson distribution 

(a0) = e? 07/21. 
and if our prior knowledge is given by the Beta-type distribution 
E(O|a) = K0(1—6), 
where 0 < @ < 1, and the case ¢ = 0 corresponds to the improper distribution k6* over all 
positive 6: then after taking observations x,, ...,2,, the posterior distribution is 
_ Kor —Oy.0-" Oma) 


where m = %;%;, 
Kf oa —6).e-" md |N1,2,! 


« eee 
forma —_ 6y e-”9 dg 


This is not of the same form as £(6|x), and hence here we have an example of a distribution 
not closed under sampling. It can be made closed by replacing the Beta distribution by the 


probability density k02+2(1 — 0) e-79 /ze! 


and if we use this as our prior distribution, we see that it is closed under sampling with 
respect to observations having a Poisson distribution. 
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The great simplification introduced by considering distributions closed under sampling 
is that the problem of sequential decision is reduced from dealing with a random walk in 
infinitely many dimensions, to being a random walk in finitely many. 


2. THE £-SPACE 


If we have a parameter distribution ¢ (0|a), where « may bea vector, and this distribution 
is closed under sampling, we can define some space £, in which each member of the family 
of distributions of the type £(0|«) is represented by a point. Our total knowledge at any 
stage in the sampling is a point in this space. 

For examples of £-space we refer to examples (a) and (b) in §1. The &-space for (a) is a 
plane with axes /, and f, and that for (b) is a k-dimensional simplex with vertices at the 
unit points. 

Now, assuming that we are dealing with a prior distribution which is closed under 
sampling, consider some form of sequential inspection in which there are only two possible 
terminal decisions—to accept or reject the batch under inspection. Suppose also we can 
define the losses associated with making the terminal decisions when the true quality of 
the batch is specified by some particular value of 9. We define the risk of a terminal decision 
at any point in the £-space to the expected value of the loss with respect to 0. One terminal 
decision is preferred to the other if the risk associated with making this terminal decision 
is less than the risk associated with making the other terminal decision. If the quality of the 
batches remains unaltered during inspection, then the risk associated with making a terminal 
decision depends only upon the posterior distribution function, and hence on the position 
in the £-space. We can therefore associate with each point in the £-space the risks of taking 
either of the two kinds of terminal decision, and depending upon which of these is the smaller 
at each point, mark the &-space out into two regions, one in which acceptance is preferred 
and the other in which rejection is preferred. Clearly, apart from possible discontinuities, 
the boundary between these regions is the locus of all points such that the risks of the two 
kinds of terminal decision are equal. This boundary will be referred to as the neutral 
boundary. 

This division of the £-space into terminal decision regions does not taking sampling into 
account. In fact, we shall have extremes in the £-space where the expected quality of the 
batch is so good or so bad that it should be accepted or rejected outright, without any 
sampling. In between there will be a region of doubt where it pays to take further observa- 
tions. We thus have three regions, a continuation region and two terminal decision regions, 
each defined uniquely by position in the £-space. 

Four boundaries are of interest: the boundaries between the terminal decision regions, 
and the continuation region; the boundary between the two terminal decision regions, and 
the boundary at which the continuation region and the two-terminal decision regions meet. 
Clearly, some of these boundaries may not exist, for example, in a likelihood ratio sequential 
procedure, the two terminal decision regions do not meet. We proceed to derive equations 
for each of these four boundaries. 


3. EQUATIONS FOR THE BOUNDARIES 


We shall refer to the two terminal decisions as decisions 1 and 2; the boundary in the 
£-space between the decision 1 (or 2) region and the other terminal decision region or the 
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continuation region, will be referred to as the decision 1 (or 2) boundary. Let the losses of 
taking decisions 1 and 2 when @ specifies the true quality be denoted W,(@) and W,(@), 
respectively, where the unit of losses is the cost of a sample, which is assumed constant. 

We now consider the boundary where the three decision regions meet; this is often a 
point and will then be referred to as the meeting point. Interest centres in the meeting 
point because if it exists and can be determined, the next section describes a method by 
which, in principle, the optimum terminal decision boundaries can be worked out backwards 
from the meeting point. 

The neutral boundary is the locus of points such that the decision risks (where risk is 
defined as expected loss) of taking either terminal decision are equal, and this is given by 
a satisfying 
{ (olay (0) — M(@)] a0 = 0. (2) 


Clearly, if the decision 1 and 2 boundaries meet, they will meet on the neutral boundary. 

Suppose now that the decision 1 and 2 boundaries meet the neutral boundary at the same 
point, then at this point the risk of making a terminal decision is equal to the cost of one 
more observation, plus the terminal decision risks associated with this further observation 
(one more observation from the meeting point can only lead to a terminal decision). If at 
the meeting point we make decision 2 (the resulting equation is the same whichever terminal 
decision is used here), then for a given 0, the risk of a terminal decision at the meeting point 
is simply W,(@). After one more observation is taken, the terminal decision risks are 


q(a, 0) W,(8) "* {1 —q(a, 6)] W,(9), 


where q(«,@) is the probability that for an « on the meeting point boundary, one observa- 
tion leads to points in termina! decision 1 region. Thus for the meeting point, « should 
satisfy 
ta. 9) W,(9) + (1 —q(«, @)) W,(4) + 1] £(0|a) dO =| £102) W,(0) d0 


which reduces to [aco [W,(0) — W,(A)] £(@|a) dé = 1. (3) 


Two points remain to be clarified, first, that the two terminal decision boundaries do in 
fact meet the neutral boundary at the same point, and secondly, that the locus of « satis- 
fying equations (2) and (3) is a complete definition of the meeting point boundary. Both of 
these points can be established in the following way. First, we derive equations for the 
decision 1 and 2 boundaries. Simultaneous solution of both of these equations defines the 
boundary where the two terminal decision boundaries meet, and it will be seen that the 
resulting equations can be reduced to equations (2) and (3) above. 

Let points on the decision 1 and 2 boundaries be denoted by a’ and a”, respectively. 
For sampling starting from position a in the continuation region of the £-space, with given 
boundaries, and the value of @ specified, let S(a,@) and A(a,@) be the expected further 
sample size, and the probability that sampling eventually terminates with decision 1, 
respectively. For points «’ and «” on the terminal decision boundaries, S(a’,@), A(a’,6), 
etc., are defined to be the values obtained if sampling is continued, but otherwise with the 
same terminal decision boundaries. 

Consider any point «’ on the decision 1 boundary, then the risk of continuing sampling 
is equal to the risk of taking terminal decision 1 immediately, see Barnard (1954). 
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To decide immediately has risk 
y [etal m(o)ao. 


To continue sampling has risk 
| [S(a’, 0) +(1—A(a", 8) W() + A(a’, 0) W(0)] E(B") dd. 


The equation for the value(s) of «’ on the decision 1 boundary is obtained by equating 
these expressions, 


fee) W,(0)d0 = [ (8,0) +(1—A(a’, 0) (0) + A(a’, 0) W(A)]E(0|a’) dO (4) 


which reduces to 
[ i810, 0) +1 A@,0) (H(O)— M(O)}1EOla’)d0 = 0. (5) 


Similarly for points «” on the decision 2 boundary we obtain the equation 
[ele W,(0) d0 -| [S(a", 0) + A(x", 0) H(A) + (1— A(x”, 0)) W(A)] E(O|x")d6 (6) 
which reduces to [use 0) + A(x”, 0) {W,(A) — W,(A)}] £(O|x”) do = 0. (7) 


It was pointed out above that the meeting-point boundary will be defined by « = a’ = a” 
simultaneously satisfying equations (5) and (7). If such a solution exists, then provided 
the next step from points on this boundary do not land back in the continuation region 
(see §§4 and 8 for discussion of this and other assumptions), we have S(«,@) = 1 and 
A(a,0) = q(«,@). On substituting these values in equations (5) and (7), equation (7) 
becomes equation (3), and equation (5) can be reduced to equation (2). Thus if a solution to 
equations (2) and (3) exists, then this solution defines the locus of points where both the 
terminal decision boundaries meet the neutral boundary. 

Equations (5) and (7) can be transformed into each other by using the equation for the 
neutral boundary. Therefore, no points in the £-space exist where the neutral boundary 
meets one only of the terminal decision boundaries. 

Equations (5) and (7) are difficult to solve because of the complex form of S(a,@) and 
A(a,@). On the other hand, the function q(«,@) is usually very simple, so that the neutral 
boundary and the meeting-point boundary can be determined. The next section describes 
the numerical procedure by which the terminal decision boundaries can be worked out 
backwards from the meeting-point boundary. 

Equations (5) and (7) can usually be solved and used in the following special case. Con- 
sider points «’ and a” on the terminal decision boundaries which are such that they lead 
only to points on the meeting-point boundary or to terminal decision points. For such 
points, S(a’,@) and S(«”,@) are both unity, A(«’,@) and A(a«”,@) are of the form q(a, 6), 
so that for points on the decision 2 boundary equation (3) holds, with a similar equation for 
the decision 1 boundary. 


4. DETERMINATION OF THE BOUNDARIES FROM THE MEETING-POINT BOUNDARY 


If we are considering a point / in the £-space, then the terminal decision risk at this point 
is 


RA) = min| | £(0|2) W,(0) do, | £(0|8) W,(0) ao} (8) 
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and the continuation risk is 
RAB) = 1+ | {£(\B) o(el0) RUE") dao, (9) 


where f’ denotes the posterior distribution for a prior distribution specified by # and an 
observation x, and where R(f) is the risk at £, defined 


R(f) = min {R,(f), Ra(A)} (10) 
so that if R(f) = R.(f), f is a continuation point, and if R(f) = R,(A), 2 is a terminal 
decision point. 

In this section we make the following assumptions (in § 8 these assumptions are discussed 
again in some detail). 

Assumption A. Given any particular origin in the £-space, if any point can be reached in 
n steps, then it cannot be reached in any other than n steps. 

Assumption B. Given any particular origin in the &-space, of all points which can be 
reached in n steps, there is at most one which satisfies equation (2) for the neutral boundary. 
(This excludes such cases as testing two normal populations with unequal variances. This 
assumption is not necessary, and is introduced here for simplification.) 

Assumption C. If a meeting-point boundary exists, then for a given origin in the £-space 
the maximum of the sample sizes associated with points on this boundary has some finite 
value, say NV. (See §8 for a further explanation of this assumption.) 

With regard to assumption C, no points of the subset designated by N—other than the 
one on the meeting-point boundary—can be continuation points. This follows, since for a 
continuation point, it must be possible to proceed sampling and reach either terminal 
decision region, and this implies the existence of a point on the meeting-point boundary for 
some N’ > N. Therefore, we can write down all the risks for the points of the subset desig- 
nated by N. Further, points of the subset designated by (N —1) all lead to points of the 
subset NV, the risks of which are known. We can therefore apply equations (8), (9), and (10) 
to all the points of the subset (NV — 1), calculate their risks and classify them as decision or 
continuation points. This process can now be repeated for the points of the subset (N — 2) 
and so on. 

This procedure is quite simple in practice, and in principle can always be done numerically, 
by the method known as dynamic programming, see Bellman (1957). A large amount of 
computing will often be required, but the problem is ideally suited to an electronic computer. 


5. EXAMPLE 1. THE MIXED BINOMIAL DISTRIBUTION 
Suppose that the distribution 4(x|@) is a binomial distribution with a probability 0 of 
a defective item, and suppose that the parameter distribution for 6 is a k-point binomial 
k 
distribution with k > 3, so that prob(# = 6;) = a; (i = 1,2,...,k), where Sa; = 1. The 
1 


terminal decisions are to accept or reject the batch under inspection, aid these are denoted 
decisions 1 and 2 respectively. The loss associated with taking decision i when 6, is true is 
written W,;. 

Equation (2) gives the neutral boundary 


k 
2a, Fi(1 —0;)" (Was — Was) = 0, (11) 
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where 7 is the number of items inspected, and r the number of defectives found. Similarly, 
in equation (3) the function q(«,6;) is the probability, given 0;, that one more observation 
from the meeting point leads to acceptance. The next observation can be either a good item 
or a bad item, and both lead to terminal decision points. One more good item can only 
increase the risk of rejection and decrease the risk of acceptance, so that q(a,0;) = 1—96,, 


and equation (3) is k 


Ya, G1 —9;)"* [1 + (1 —6;) (Wi — W,)] = 0. (12) 

1 
Wetherill (1960) has pointed out that equations such as these take on a particularly 
simple form if we assume the prior distribution to be such that 0,/(1—0,) = A‘, for 


i = 1,2,..., 4, and it is also shown that these equations are easily solved by Horner’s method. 
If we make this assumption, equations (11) and (12) for the meeting point become 


k 
LYa,(1 —0,;)" «(Wi — We,) = 0 (13) 
1 


and 


“Me 


a,(1—0,)" x*[1 + (1 —6,) (Wy; — W,)] = 9, (14) 
where x = A’. 

Numerical example. Suppose that we have A = §, so that 0, = }, 0, = 75, 95 = dg; also 
suppose that W,; = max {0,500(0;—0-07)} and W,; = max {0,500(0-07—6,)}, so that 
W,, = 90, Wy, = 15, W,, = 17-14; and take the values of a; to be a, = 0-05, a, = 0-25, and 
a, = 0-70. 

For this example, equations (13) and (14) are quadratic equations, and they have a 
common solution for an x corresponding to r ~ 1-75, n = 15. (For the method of solution 
see Wetherill (1960).) Because of the discreteness of the problem, the last point in the con- 
tinuation region must correspond to integral n and r both less than or equal to the values 
given for the meeting point, so that for this example, r = 0 or 1,andn < 15. We can therefore 
proceed by determining the position of the acceptance and rejection boundaries for r = 1. 
This can be done by the method outlined in §4, or alternatively it is quite simple for the 
stage immediately prior to the meeting point, to employ equations (5) and (7). 

Consider equation (5) for the acceptance boundary. When r = 1 one further observation 
leads to a terminal decision, so that S(«,@;) = 1, and A(«,0;) = 1—@;, and the equation 
becomes k 

Da,9}(1 —6;)"* [1+ 6,(W; — W,,)] = 0, 
1 


from which the value of n giving the position of the acceptance boundary at r = 1 can be 
determined. At the rejection boundary for r = 1, one further good item will not always 
lead to acceptance (if there is a point in the continuation region), and therefore S(«,0;) > 1, 
and A(«,@;) < 1—6;. When equality holds in these expressions, equation (7) for the rejection 
boundary reduces to equation (12), and it follows that for r = 1, all points for which the 
expression of equation (12) is positive are certainly rejection points. Any points for r = 1 
not thus classified by these two boundaries, may be either continuation points or rejection 
points, and these can be checked numerically by the method given below, or by using an 
equation similar to (12) in which we may have to introduce a more complicated function 
for S(«,@;). For this example (n = 9,r = 1) is the last continuation point, and (n = 10,r = 1) 
and (n = 7, r = 1), are acceptance and rejection points, respectively. 








288 G. B. WETHERILL 


For the problem outlined above, the procedure for working back to find the optimum 
decision boundaries is as follows. Denote the posterior probability of 0; at any point 
(n,r) by a}, so that at (n,r) 


k 
a; = a,07(1 -ay| ¥4;05(1-0,)"—. (15) 
1 


(Actually we should write this as a;(n,7), but it simplifies the notation to use simply a.) 
The terminal decision risk at (n,r), Ra(n,7r), is defined 


k k 
R,(n,r) = min (3 a.W,, da; W,, ; (16) 
1 1 
k 
The probability of a bad item at (n,r) is } a; p; = b(n,r), say. Then the risk at any point is 
1 


R(n,r) = min {R,(n, r), R,(n,7r)}, (17) 
where F,(n, 7) is the continuation risk defined by 
R(n,r) = 14+b(n,r) R(n+1,r+1)+[1—d(n,r)] R(n + 1,7). (18) 


On using these equations for the above numerical example we have the following sequential 
scheme: 

Accept the batch when sampling reaches n = 9, r = 0. 

Reject the batch when sampling reaches r = 1, and » < 7. 

If sampling reaches n = 8, r = 1, inspect another item, and reject if it is bad. 

If sampling reaches n = 9, r = 1, inspect another item, reject if it is bad, and accept if 
it is good. 

The total risk of thisschemeis 4-96, as against the risk of 7-2 of accepting without sampling. 


6. EXAMPLE 2. A BETA PRIOR DISTRIBUTION 


Suppose that the distribution ¢(x|@) is a binomial distribution with a probability 0 
of a defective item, and that the parameter distribution for @ is a Beta distribution 
6s-1(1 —0)-1/A(s,t). Let the decision losses associated with wrongly accepting or rejecting 
a batch be k |6—4,|. 

Denote the number of bad items found during inspection by y, and the number of good 
items found by x, then equations (5) and (7), or alternatively (2) and (3), reduce to 


(y+s)(1—4)/0, = x+t, (19) 
(yt+st+autt+1)(y+st+x+t)k = (x+t)[0)(~+t+1)—(1—6,) (y+s)]. (20) 
From these equations it follows that the meeting point is at (X, Y), where 
Y = k@2(1—0))-—A)-—8, (21) 
X = (Y +8)(1—4)/0,-t. (22) 


Champernowne (1953) has written a paper on the most economic boundaries for the Beta 
distribution, but the values he gives for the meeting points are very different from the 
values given by the above equations. Actually Champernowne tabulates boundaries, not 
meeting points, and many of the differences are accounted for when we realize that the 
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farthest point reachable in sampling may be considerably nearer the origin than the meeting 
point of the optimum boundaries. 

Suppose the meeting point is at (x,y), then it may well be that at (x,y—1) it pays to 
accept, and at (x—1,y) it pays to reject, while (x—1,y—1) is a continuation point. Then 
clearly sampling will never reach (x, y), for if (w—1, y— 1) is reached, the next item sampled 
will always lead to a decision. In order to find the last reachable point in sampling we 
examine the points immediately off the neutral line, towards the origin from the meeting 
point of the boundaries, to see when they start being continuation points. If we are exam- 
ining to see if (x,y) is the last reachable point in sampling, we compare the continuation 
and terminal decision risks for (x, y— 1) and (x —1, y); if one (or both) of these is a continua- 
tion point, then (x, y) can be reached in sampling. 

For any point, the terminal decision and continuation risks can be compared by com- 
paring the left- and right-hand sides of either equation (4) or (6) (or alternatively using the 
expressions of equations (5) and (7)). Now suppose (2, y) is a continuation point, and both 
(x+1,y) and (x, y+1) are terminal decision points, then for (x—1, y), given that sampling 
is continued, the probability of acceptance is (1 — 4)", so that the expected further sample 
size is 1.0+2(1—0) = 2—0. Thus for a corresponding to (x—1, y), we have 


A(a,0) =(1—0)2 and S(a,0) = 2-0. 


We now substitute into equation (6), and if the left-hand side is greater than the right-hand 
side then (2—1, y) is a continuation point, otherwise it is a terminal decision 2 point. The 
point (x, y— 1) can be examined in a similar way, and we proceed in this manner, examining 
points immediately off the neutral line until we find one or both of such points to be con- 
tinuation points. We shall then have determined the last reachable point in sampling, as 
described above. 

An example has been worked out by this method for #, = 4, k = 200, so that the boundaries 
and decision risks are symmetrical. Equations (21) and (22) give the meeting point as 
X = Y = 23-5. The last reachable point in sampling is (6,6), since by the above method 
(5, 6) and (6, 5) were found to be the farthest off-neutral line points which are continuation 
points. Champernowne has given boundaries for this case, and they give the same point, 
(6, 6), as the last reachable point (his boundaries cease here). For this example, the boun- 
daries determined by my method agree with those given by Champernowne with the excep- 
tion of (2, 0) which he gives as a continuation point, while he gives (0, 2) as a decision point 
(which is correct). Clearly, the boundaries must be symmetrical. This is the only difference 
I have found between my boundaries and Champernowne’s. 


7. EXAMPLE 3. A NORMAL PRIOR DISTRIBUTION 


Suppose that the distribution ¢(2|@) is normal with a mean m and unit variance, and that 
the parameter distribution for m is normal with mean j« and known variance o?. The two 
decisions to be made are to accept (decision 1), or to reject (decision 2), the batch under 
consideration, and we shall take the loss function to be « |m—my|. 

Clearly, the normal parameter distribution is closed under sampling with respect to 
normally distributed observations. If a sample of n observations has a mean Z%,,, the pos- 
terior distribution of m is well known to be normal with a mean (no*Z,, + )/(no? + 1) and 
variance o?/(1+n0?), and this will be written N(y,,,). 

19 Biom. 48 
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The neutral line is therefore 


ES ge _1 (m= Hal? in ox 
jem} .™ My) exp 3| 4 \am == 0, 


which is Hy = (no*Z,+)/(no?+1) =m, (23) 
or Ly = M+ (my,—p)/no?. (24) 


Thus for all n, given a position (”,,n) on the neutral line, the value of z,,,, required to 
remain on the neutral line at (n + 1) is mp. 

In order to determine the meeting point of the boundaries we must evaluate equation (7), 
and for this example q(«,9) = ®(m,|m), where ®(x|m) is the normal probability integral 
with mean m and unit variance. Equation (7) becomes 


e) l a 2 
jem}. ®(m |m) exp -5("5"| dm = 1. (25) 


The meeting point must be on the neutral line (23), and using this, equation (25) can be 
written 





= 1 
es | @(m,|m) dexp — 3 


where N denotes the sample size at the meeting point. This equation is clearly 


—m,)2 
m mel 
> 


ae al thee 
V(2m) f(L+o%) — 
On substituting for o%,, we have a quadratic equation for N, 
aot = 2n[1+(N +1) 07] [1+ No?] 


and the only root of this which can be positive is 


eu t..4 
| Fo a ee ve 
N /(G+a) 27 G2 (27) 


The meeting point is therefore specified by ~ = mo, where N is given by (27). The sample 
mean Zy corresponding to vy is obtained from (24) and (27), 


(26) 


Zy = Mg + (My — 11)/{o?(4a? + 3) — 407 — 1}. (28) 
When N is large, we have approximately 


a 1 
hence Ey XX My +.4/(27) (Mm, — p)/a0?. (30) 


These functions appear to be of the right form. The sample size at the meeting point, N, is 
an increasing function of the losses «, and also of a? which represents the precision of the 
prior distribution. The effect of the parameter y of the prior distribution on %y decreases 
with increasing N, and also decreases with increasing variance o*. 


8. CONDITIONS AND ASSUMPTIONS 


It is clearly of interest to know what are the necessary and sufficient conditions that 
(i) the procedure for working back to find the optimum boundaries can be performed, and 
(ii) that a meeting point exists. We shall reconsider assumptions A, B and C of § 4, and also 
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consider a fourth assumption, which we shall label D, that no sequence of sampling can 
result in repeating a posterior distribution which has been reached before. 

First, it is clearly not necessary for there to be a meeting point for (i) to be possible, as it 
is sufficient if there is a ‘last reachable point’ in the continuation region, in the sense 
described in example 2. Thus if the decision boundaries tend asymptotically to the neutral 
line, (i) will be possible for discrete distributions ¢(x|@). Further, assumption A is not neces- 
sary either, for (i) is possible if sampling can jump a stage from, say, a point in the set n 
to a point in the set (n + 2); however, it must not be possible for a point to lead to points in 
previous subsets, and assumption D is sufficient to ensure this, but may not be necessary. 

Assumption B is really irrelevant, the essential point being that once a meeting point 
has been determined, we wish to be able to write down the risks for all points in the subset 
containing the meeting point. If what has been described as the meeting point is in fact 
a line or surface containing a (possible infinite) number of points, then for (i) to be possible 
all that is necessary is that a ‘last reachable point’ exists. If the neutral line has two or 
more branches, (i) is possible provided sampling is closed on each branch. 

Assumption C is introduced to cover the case where the £-space has at least three dimen- 
sions, so that the meeting-point boundary is a line or surface, and it may be that for a given 
origin in the £-space, the number of steps taken to reach this boundary is finite for only part 
of the £-space, and otherwise infinite. For this situation, unless a last reachable point exists, 
(i) will be possible for at most only parts of the £-space. For an example of this consider 
testing among three binomial populations with probabilities 0), 0, and 0,, where 0, > 9, > 42, 
and three associated terminal decisions. There are parts of the £-space in which 0, or 6, 
have very small prior probabilities, so that the sequential procedure reduces to a test of 
two binomial populations, and clearly for the two separate cases of this, sampling may be 
closed for one and not for the other. For examples 1, 2 and 3 of this paper, the £-space is 
two-dimensional, and the question does not arise. 

Now let us consider examples 1, 2 and 3. For example 3, the posterior distribution has 
a variance which is a decreasing function of n, and assumptions A, B and D hold. For 
example 2, similar remarks apply and for both examples 2 and 3, there exist explicit for- 
mulae for the meeting points. 

The k-point binomial distribution of example 1 is more complicated. Although assumptions 
A, B and D all hold, there is not necessarily a meeting point, see Vagholkar & Wetherill 
(1960) and Wetherill (1957). The situation can be given briefly as follows. Suppose the loss 
functions are k |6;—6,|, then if we are sampling from a population with 0 close to 95, there 
can be a stage in the sampling where all 0,’s except those either side of @, are negligible. 
In such a situation, the optimum boundaries must approximate to those for a two-point 
binomial, which are open and parallel. To exclude this possibility it is necessary to put a 
condition on the loss functions, to ensure that it is not worth sampling for the two-point 
binomial scheme based on these two particular ordinates (i.e. those on either side of pp) 
see Vagholkar & Wetherill (1960) and Vagholkar (1955). 

It is interesting to notice that the two-point prior distribution (e.g. k = 2 inexample 1), 
is such that for nearly every possible prior distribution there is a sequence of sample values 
(n,r) which will repeat it. This is easily proved, for the posterior distribution is a simple 
function of the likelihood ratio, A, where 


log A = rlog (0,(1 — 4,)/0,(1 — 0) + m log ((1 — 4;)/(1 — 92)) 
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and there is a sequence of values of (n,r) which yield any given value of A, provided the 
logarithms on the right-hand side have a rational ratio. 

Now if a closed sequence of points is possible in the £-space, sampling can enter a closed 
loop, the sample size can be infinite, and the boundaries must be open. It would appear, 
therefore, that assumption C is necessary for a meeting point to exist. 

I conjecture that, with ‘reasonable’ loss functions, and continuous probability densities 
£(@|x), assumption D is both necessary and sufficient for a meeting point to exist. (By 
‘reasonable’ loss functions, I mean loss functions which are either monotonically increasing 
or decreasing functions of 0.) 

The discussion in this section is incomplete, but it is hoped that sufficient has been said 
to give a lead to further work. 


9. GENERALIZATIONS 


We have assumed throughout that we have only two terminal decisions, but there is no 
difficulty in extending the theory to three or more decisions. With three decisions, for 
instance, the £-space can be divided into four regions, and there are in general two neutral 
lines. 

The restriction to prior distributions closed under sampling could be abandoned, but this 
would yield much more complex results which are more difficult to apply, and I suspect 
that there is not a corresponding gain in generality. 


Sections 1 to 3 are based on work in my Ph.D. thesis, and I am greatly indebted to Prof. 
G. A. Barnard for his generous help throughout. I am also indebted to Prof. D. R. Cox for 
some helpful discussions during preparation of the paper, and to Prof. D. V. Lindley for 
some useful comments on an earlier draft of this paper. 
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Some properties of the Spearman estimator in bioassay* 


By BYRON Wm BROWN, Jr 
School of Public Health, University of Minnesota, Minneapolis 


1. INTRODUCTION AND SUMMARY 


The quantal bioassay is one of the most frequent of biological experiments. There is need 
for simple analyses of the data from such experiments. The non-parametric Spearman 
procedures are satisfactory from the point of view of simplicity. Finney (1950, 1952a) has 
done some computations demonstrating that the bias of the Spearman estimator is small and 
the efficiency is high, if the dose-response function is normal or logistic and the doses are 
chosen close together and covering a wide range. Because the Spearman estimator can be 
computed without specifying the functional form of the dose-response function, it is 
desirable to know the properties of this estimator over a wide class of functions. In this 
paper Finney’s experiment is formalized as an ‘infinite experiment’, the Spearman esti- 
mator is defined for this experiment, and bounds on the bias of the estimator are obtained 
over specified broad classes of dose-response functions. The variance of the estimator is 
approximated and this approximation is investigated. The high efficiency reported by 
Finney is verified for the normal and logistic functions, and high efficiencies are computed 
for a number of other functions, but a family of functions is found for which the efficiency 
can be arbitrarily close to zero. 


2. SPEARMAN ESTIMATION AND PARAMETRIC ESTIMATION IN QUANTAL BIOASSAY 


In the usual quantal bioassay model, the mean (or median) of a distribution function, 
F(x), is estimated by performing the following experiment. k values are specified on the 
a-scale, %1,X,...,%,. For each x;, n; observations are taken on the random variable with 
distribution F(a). All that is observed is the number (r;) of these n; observations that do 
not exceed x;. The n,+n.+...+, observations are assumed to be mutually independent. 
The result of the experiment is a set of mutually independent binomially distributed 
random variables, r;, with means n; F'(x;) (¢ = 1, 2,...,k). 

A practical application of this model is in drug assay where F(x) may be the probability 
of death for an animal given dose x (x may be a transform of the dose such as log dose). The 
purpose of the experiment may be to estimate the dose which will give 50% probability 
of death. The experiment consists of giving varying doses of the drug to groups of animals 
and recording the number of animals that die in the several dose groups. 

The usual estimation procedure (see Finney, 19526) depends on a parametric representa- 
tion of F(x). The median or mean is estimated as a function of the maximum likelihood 
estimates of the unknown parameters. Berkson (1955) has recommended the use of other 
parametric estimators asymptotically equivalent to the maximum likelihood estimates but 
easier to compute. 


* Part of a doctoral dissertation prepared under the joint guidance of E. A. Johnson and I. R. Savage, 
with financial support through a NIH Research Training Grant. 
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If the dose levels (,;) are 2,,x, = %,+d,...,%, = 2%,+(k—1)d, the non-parametric 
Spearman estimator is defined as follows (Spearman, 1908): 


k-1 
Z = p,(x,— 43d) + = (x; + 3d) (Dis. —D;) + (1 — py) (@ + 4d) 


k 
=2,+3d—-d > p, where p; =1;/n. (2-1) 
i=1 
The Spearman estimator compares favourably with the usual parametric estimators. 
It is simpler to compute, simpler in concept and non-parametric. Furthermore, exact 
distributional theory can be developed for it, and its efficiency is high. Results presented 
in this paper substantiate the last two points. 


3. THE INFINITE EXPERIMENT AND THE SPEARMAN ESTIMATOR 


The characteristics of the Spearman estimator are investigated here for an experiment 
involving an infinity of dose levels, x; = x) +1d (i = 0, + 1, + 2, ...), The dose-mesh location, 
Zp, and the dose interval, d, will be assumed arbitrarily chosen and fixed for the present. 
At each dose level, x;, » observations will be taken. All observations are assumed mutually 
independent. 

The information for the binomial variable at dose level 2; is n; F,(1—F;). The information 
at all but a finite number of dose levels is negligible. Therefore, results obtained in investi- 
gating the infinite experiment will be relevant to finite experiments with dose levels 
covering the range of information. 

For this infinite experiment the Spearman estimator will be taken from (2-1) to be 


B= ES (+44) (Peur—Po (3:1) 
or Z=mt+hd+d 3 (1-p)-d & m (3-2) 
i=1 i=—@ 


4. THE BIAS OF THE SPEARMAN ESTIMATOR FOR FIXED INFINITE DOSE MESH 


If F(x) has a first moment yp, the series defining the estimator (3-1) converges with 
probability one and has mean 


E(@ |x) = ¥ (e+ 4d) (Fr F), (41) 


where F; = F(x;). The bias of %, conditional on x, and d, is exactly analogous to the bias due 
to grouping in the usual continuous distribution. The bias will be denoted by B(%|2)) and 


can be written B(2|xo) = E(@|x9)—p 


=F tid) Pa A)- |" 2dF@) 


= X (a%+3d-¢;) (Fiii—-F), (4-2) 
i=-—o 
where c; is defined as follows = a 
C; -| vaF(x)/|| dF (x). (4:3) 
XY Zi 


Since x; < ¢; < %;,,, expression (4-2) implies the following lemma. 
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Lemma 4-1. |B(%)|x9)| < 4d 

It is obvious that for a one-point distribution with mass point located at one of the dose 
levels, this dose level will be the mean, while the Spearman estimator will have value 4d 
units lower. Thus, the Spearman estimator attains the bound of $d. From this example, 
it is clear that for any F(x), if d is large enough, the bias can approach 3d. 

In the following lemmas, tighter bounds on the bias are obtained and evaluated by im- 
posing restrictions on the class of dose-response functions. In Lemma 4:2, F(x) is assumed 
to have s+ 1 derivatives and symmetrical density function. In Lemma 4:3, F(x) is assumed 
to have a unimodal density. Sharpness of the latter bound is shown. 


Lemna 4-2. Let f(x) be the first derivative of F(x). If f(x) exists, is symmetrical, is differ- 
entiable s times, and if f(x) has limit zero as x > +00, for v = 0,1, 2,...,8, then 


|B(%|2x9)| < oup |, eaearn |” |f(x)|dz for v=0,1,2,...,8, (4-4) 
where P,(x) is the vth Bernoulli function 
© cos 2kmx 
P,,,(2) ~ 2 22u—l (fez) 2u (u = 1, 2, soe)s (4 5) 
© sin 2k7x 
Pay i1(2) = Py 22u(k7r)2ut1 (u = 0, 1, 2, woe) (4-6) 


Proof. Let %;, be the ~_— estimator computed from the 2k + 1 dose levels about 2): 


ke 
ty, = 2 x + $d) (Di41 — Di) + P_~(U_y, — $d) + (1 — py) (4% + 3d) 


= ty—d ¥ (p;—}). (4:7) 


Then the expected value of %,, conditional on x», d and n, and denoted by E(%,|x»), is as 
follows k 
E(%;,|%) = %y—d Y [F (xy + td) — 3). (4-8) 
-k 


By the Euler-MacLaurin sum formula (see Cramer, 1951) 
k 
E(%,|%) = %— a{ | [F(x + 2d) —4)dx+ 3[F (x9 +kd)—1+ F(x)—kd)] 
-k 


k 
-aj P,(x)f(tot+2xd)dx}. (4-9) 
-k 


The expected value of , computed from the results over the infinite dose mesh, can be 
obtained by taking the limit of H(Z,,|z,) as k goes to infinity. This limit can be obtained by 
taking limits of the several terms in (4-9). The limit of the second term in brackets in (4-9) 
is zero, since F is a distribution function. The first term in brackets in (4-9) can be written 


U+kd+ (ro— pf) 
[Pe +2d) 0 de=5)" wees W)~ Bay. (4-10) 


Since F(y) is symmetric, F(y) —} is an odd function with respect to y = , and the integral 
from pw — kd + (%—) to w+kd —(x)—) is zero. Therefore, it follows that 


[i Whee tay — ade = 5 PW) Hay. (4-11) 
—k 


d ut+kd—(z,— 2) 
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Then, since F is a distribution function, it follows from (4-11) that 


lim ‘ [F (y+ 2d) — $]dx = “0—*, (4-12) 
k 


kw — 





Using (4-12) and the limit of zero for the second term in brackets in (4-9), it follows that 
E(%|x) = lim E(&,.|x9) 
k> oo 
k 
=pt+limd?| P(x)f(x »+xd) dx. (4-13) 
k> @ —k 
Making use of the relationship between Bernoulli functions 
P(x) = (—1)*7 B_, (2), (4-14) 
the integral in (4-13) can be integrated by parts to obtain 


BG) = w+ (— 1rd” Pea (EG) forayde. (4:15) 





From (4-15) the bound on the bias, (4-4), can be obtained. 
For v = 0 and v = 1, the bound (4-4) is 


|B@|x)|<4d (v=), (4-16) 
|B(z|x)| <347fn (v= 1), (4-17) 
where Sm = max f(z). 


If f(z) has two points of inflexion, (4-4) for v = 2 is simply written 
|Be@|a)| < 0-0320279%, (4:18) 
where f}” is the absolute value of f(x) at its points of inflexion. 
The bounds on the bias, (4-4), depend on the assumption of symmetry of f(x). The bound 


for v = 1, (4-17), can be improved slightly if the assumption of unimodality of f(x) is sub- 
stituted for the assumption of symmetry. 


Lemna 4-3. If F has first derivative f(x) unimodal with f,,, = maxf(x), then 
z 
| B(Z|x9)| < $d?f,,. (4:19) 


Proof. In (4-2) the ith term in the expression for the bias, B(%|x 9), will be denoted by B;. 
This term can be written B, = (2, +44—¢,) (Fag -F,) 


= i (ae, + 4d—a) f(a) dee. (4-20) 


The unimodality of f(x) implies that f(x) is non-decreasing for x < 2,,, and non-increasing 
for x < x,,, where z,, is the mode. Using this property it can be shown that the maximum 
absolute value of B,, for F,,,—F, fixed, is less than the value obtained for the step function, 
g(x), with one jump appropriately placed in each interval, x; < x < %;,,, to yield the prob- 


ability F;,,—, for the interval. For 2;,, < 2%» 


g(x) = f(x;) (x, <2 <4,+R,d), 


YX) =f(%i41) (4+ Red < x < x44), 
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where R; is determined by mre 
[P" oe)de = FaaF. 


Ti 
For x; > 2X», the function g(x) would be defined similarly. For the interval containing z,,,, 
say (29, 21), g(x) would have a jump in (29, z,,) and another in (z,,,2,) with the jumps located 
to yield appropriate probabilities, F,, — K, and F, — F,,, for the intervals. 
Given the function g(x), the values of the bias terms (4-20) for g(x) can be obtained, and 
these will be bounds for the B; computed from f(x). For x;,, < 2,, the result is 


B; < PR (1 —B,) (fis —fi)- (4-21) 
Since 0 < R; < 1, it followsthat 3B; < jd*(f;,,—f;). (4-22) 


Similar bounds for the intervals above the mode and for the interval containing the mode 
can be obtained. When these bounds are summed, the result is independent of the f; and is 
the bound for the bias given in (4-19). 

The bound (4-19) on the bias of the Spearman estimator is sharp. This is easily verified 
for f(x) the uniform distribution and appropriate choice of 2, and d. 


Table 1. Functional forms commonly used for F(x) 


Name Functional form Range Variance 
7 
Logisti 1+e-Ae-p}-1 —0< = 
ogistic [l+e ] Te @ 3p? 
1 fAe-p) 1 
Normal Jam | et dt —-o<24%< 0 — 
V(27) J —0 - 
P 7 —8 
Angular sin? [f(x —) +47] —4i < P(x-p) < }n Tee 
1 
Uniform B(x—p)+4 —4<f(x-p) <4 ry 
One-particle or 1—exp(—e*-7-#) —-o<x<0 4n? 


extreme value 


Notes. (i) The means of all five distributions are y. (ii) For the first four distributions, # > 0. (iii) For 
the fifth distribution, y denotes Euler’s constant. 


Table 2. Bounds for the bias of the Spearman estimator for the infinite experiment 


Bound for the bias* 





Tolerance fH A — 
distribution As a proportion of of As a proportion of Rt 
Logistic 0-0567p2at 0-0866p”*Rt 
Normal 0-0499p2a 0-0839p"*R 
Angular 0-0427p*a 0-0804p"*R 
Uniform 0-0361p2o 0-0750p”"R 
One-particle or 0-1013p*o0 0-1561p"R 


extreme value 


* Using (4-19) for the bound, |B(%|29)| < (44?) fim 
+ o@ is the standard deviation of the tolerance distribution and d = po. 
{ Ris the distance between the 20th and the 80th percentile and d = p’R. 


Some frequently used functional forms for F(z) are given in Table 1. In Table 2 the value 
of the bound (4-19) is given for each of the functions of Table 1. From Table 2 it can be seen 
that for these five functions the bias will be less than 0-100 for d less than o. In keeping 
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with the non-parametric spirit of this investigation, and for purposes of quicker com- 
prehension and approximation from actual data, the bounds in Table 2 (and Table 4) are 
presented also in terms of the difference (R) between the twentieth and eightieth percentiles. 
For the five distributions of Table 1, d less than $F will result in a bias less than 4% of R. 
In practice such a bias would be negligible. 


5. THE VARIANCE OF THE SPEARMAN ESTIMATOR FOR FIXED INFINITE DOSE MESH 


If F(x) has a first moment /, then, for fixed xp, the series defining the Spearman estimator, 
(3-1), converges with probability one and the resulting estimate has variance V(Z|z»), 
given by 


Vir|x) =< & Fee) Fee (5-1) 
Let V(%|x)) be approximated by the ir-tegral 
Ve) =" [° FU-Fyde. (5-2) 
Lemma 5-1. For all F, zo, d and n 
| V(%|x) — V(z)| < = ; (5-3) 


Proof. Let x,, be a median of F(x). Then F(1—F) is non-decreasing for x < x,, and non- 
increasing for x > x,,. Number the x; so that x) < x, < x;. It follows that 


Ah F(\-F)dx < F(1-F)d (i=0,-1,-2,...), 
Ti-1 


Ti+n 


F(l-F)dx < F(1-F)d (i=1,2,3,...), ? (5-4) 





["r0-Pyae <3.3.4. 


Combining the inequalities of (5-4) it follows that 


d > F(1-F)+3d >|" F(1—F)dz. (5-5) 
Multiplying the inequality (5-5) by d/n yields 
= Tm: d? 
V (|x) — V(%) > ~—. (5-6) 


Inequalities analogous to those in (5-4) in the other direction can be written down and 
from them it can be seen that }d?/n is also an upper bound for the difference, V (%|x,) — V (Z). 
The bound (5-3) is sharp. This is easily verified for F(x) a two-point distribution with equal 
probabilities at the two points, and with d and 2, appropriately chosen. 

Further bounds in higher powers of d can be obtained using Euler-MacLaurin expressions 
for the variance (5-1). One result, for F(~) symmetrical with two points of inflexion, is 


|V(%|2)— V(z)| < = fa where f,, = max/(z). (5:7) 
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Table 3 gives the values of V (Z) for the distributions of Table 1. Table 4 gives the bounds 
on the relative deviation of V(%|x,) from V(%), derived from (5-3) and (5-7), for the distribu- 
tions of Table 1. 





} Table 3. Values of V for several distributions 
Distribution V(z)* 
2 2 
Logistic 0-5513p “+ 1-2881p’ + 
j n 
2 2 
Normal 0-5642p — 1-5983p’ 
} n n 
' 2 R2 
Angular 0-5750p—— 2-0376p' — 
n n 
2 R?2 
Uniform 0-5774p— 2-4942p" — 
n n 
o R? 
One-particle or 0-5404p — 0-3508p’ — 
) extreme value ” a 


a d f° 
* V(z)= “| F(1—F) dz. 
-2 
+ o is the standard deviation of the tolerance distribution and d = po. 
{ Ris the distance between the 20th and the 80th percentile and d = p’R. 





Table 4. Bounds for the relative deviation of |V(%|x,) — V(z)| 
IV el20)— Pea) 





| Bound on rm 
Distribution First bound* Second bound* 
Logistic 0-4535pt or 0-6932p’t 0-2741p*+ or 0-6404p’*t 
Normal 0-4431p or 0-7458p’ 0-2357p?, or :~=—s« 0-66 77”? 
) Angular 0-4347p or 0°8183p’ 0-198lp* or 0-7020p’2 
Uniform 0:-4330p or 0-9000p’ Not applicable 
One-particle or 0-46269 or 0°4126p’ Not applicable 


extreme value 


* The first bound is computed from (5-3) and the second bound from (5-7). (See Table 3 for the values 
of V for the several distributions.) 

+ o is the standard deviation of the tolerance distribution and d = po. 

¢ RB is the distance between the 20th and the 80th percentile and d = p’R. 


6. RANDOM LOCATION OF THE DOSE MESH 


In §§ 3-5, the location of the dose mesh, determined by 2», was assumed to be fixed. If 
) 2%, is assumed to be chosen at random, with distribution uniform on the interval (0,d) say, 
the estimator, %, will be unbiased. Denote the expectation over 2, and over the conditional 
binomial variables by Z(%). Then 


'd 
BG) =|" BG |e) dz, 


1 fd @ d , Tet+id+d 
= i], 3 (2+5+i) | dF (x) day. (6-1) 


o+id 
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By integrating first with respect to x), then with respect to F(x), and finally summing, the 
expectation is easily shown to be y. 

If the expectation over 2, alone is denoted by H,, and the bias of % conditional on 2» is 
denoted by B(%|x,), then the mean square error or variance of % for random choice of 2, 
can be written as follows S 

V(%) = E,, E{(%—p)*|x9), (6-2) 
V(&) = B,,.[V(2|x)] + £,,[B*(@|x9)).- (6-3) 


The first term on the right of (6-3) can be written 


41q = ; ; 
Be{V Gla) =[ 5 E Play +id) [1 — Fle id) da 


= “ | ¥ F(x) (1—F(x)] dz. (6-4) 


The final expression in (6-4) is the integral denoted in expression (5-2) by V(z). No simple 
expression has been found for the second term on the right of (6-3) but it follows from (4-16) 
that this term is O(d?). Therefore, it follows that 


V(%) = V(%)+0(d?). (6-5) 


Note that the second term of V(%) is independent of n and of smaller order in d than the 
first component, and that V(%) contains d only in the form of the factor d/n, the inverse 
of the number of subjects tested per unit interval on the dose scale. 

The size of the experiment can be measured by the number of subjects tested per unit 
interval on the dose scale, n’ = n/d. For fixed n’ and random location x», the following 
theorem can be proved. 


THEOREM 6-1. V(Z) is a minimum, for fixed n’ and random x», when n = 1 and d = 1/n’. 


Proof. V(z) = sf ra -F)de+5 | BG) diy. (6-6) 
—« 0 


The tirst term on the right of (6-6) is independent of the choice of n, for fixed n’. The 
proof consists of demonstrating that the second term decreases if n is decreased to 1, so 
that d becomes 1/n’. The manipulations involve breaking up the second term in (6-6), for 
sample size n, into terms corresponding to the smaller intervals of size 1/n’, examining the 
difference between biases, conditional on 2p, for the two situations and then integrating out 
the x). The details will not be presented here. 

Theorem 6-1 demonstrates that the dose mesh should be as fine as possible for the purpose 
of estimation by Spearman procedure. 


7. ASYMPTOTIC EFFICIENCY OF THE SPEARMAN ESTIMATOR 


Suppose that n’ increases, and n = 1, d = 1/n’. Then it follows, for x, either fixed or 
random, that Z is consistent with a variance that can be written 


V(z) = a F-Fde+0(3). (7-1) 
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It is convenient, in the case of a sequence of random variables, to approximate the 
variances by simpler terms correct to order n—!. (When the sequence does not have variances, 
the variances of a sequence of limiting distributions are used.) Such a sequence of approxima- 
tions are called the asymptotic variances. In this sense the first term on the right of (7-1) 
will be called the asymptotic variance of % as n’ becomes large and will be denoted by V,,(Z): 


Vz) == | ” F(L—F)dz. (7-2) 
nN J—a 

Cornfield & Mantel (1950) showed that the Spearman estimator and the maximum likeli- 
hood estimator are approximately equal and that the approximation improves without 
limit as d goes to zero. Bross (1950) and Haley (1952) obtained sampling distributions of 
the Spearman estimator and the maximum likelihood estimator by enumeration for some 
specific choices of dose mesh and sample size, for the logistic and normal tolerance dis- 
tributions respectiveiy. Their results showed that the Spearman estimator was more 
closely concentrated about the true value, ~, than was the maximum likelihood estimator 
for every situation considered. 

Finney (1950, 1952a) computed the mean-square error for the Spearman estimator and 
the information for the experiment for several choices of x), and then averaged over 2p. 
Enough dose levels were used so that the results were the same as would be obtained by 
assuming the dose mesh to be infinite. Comparison of the reciprocal of the average informa- 
tion over 2, with the average square error over x, resulted in ratios of 0-9814 for the normal 
tolerance distribution and 1-0000 for the logistic tolerance distribution. 

Finney’s criterion for evaluating the Spearman estimator was used here. The concept of 
information for the finite experiment was extended to the infinite experiment by con- 
sidering infinite series of information terms corresponding to the doses. The expectation 
of this series was taken over x, with uniform distribution on (0, d). The result is the informa- 
tion (Z) for the infinite experiment 


a ig [F,(x)? ; 
1-0" eter e 


where F(x) is the partial derivative of the distribution with respect to the single unknown 
parameter, /. 

The asymptotic efficiency, HZ, of the Spearman estimator is then defined as the ratio of 
I~ to the asymptotic variance of the estimator, V.,(%) 


J-1 
~ Va(@) 
‘co rs) F -1 
-| *_Fee){1-Flw))dz| oes ad , (7-4) 


If Z is evaluated for the distributions given in Table 1, the following results are obtained: 
normal, H = 0-9814; logistic, H = 1-000; angular, H = 0-8106; one particle, H = 0-8319. 
The definition of information for the uniform distribution is not applicable so the efficiency 
cannot be computed. Note that the Spearman estimator has full asymptotic efficiency for 
the infinite experiment when the tolerance distribution is logistic. 

The efficiency of the Spearman estimator is not necessarily high. Consider the following 
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family of symmetrical tolerance distributions with the single unknown parameter, p, 
appearing as a translation parameter 


F(x; 2) = Ke) 1 Pi a eae (7-5) 





1+ 2e 


It can be shown that the asymptotic variance of the Spearman estimator for this family of 
tolerance distributions goes to infinity as e goes to zero, while the information is bounded 
away from zero. Thus the efficiency for this family of tolerance distributions goes to zero 
as e goes to zero, i.e. as the distributions approach the Cauchy distribution. Since the 
Spearman estimator simulates an arithmetic mean based on grouped data, this inefficiency 
is not surprising in view of the known inadequacy of the actual arithmetic mean for the 
Cauchy distribution itself. 
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Inconsistencies in a schedule of paired comparisons 


By PATRICK SLATER 
Institute of Psychiatry, The Maudsley Hospital, University of London 


1. THE METHOD, AND THE HYPOTHESES CONCERNING IT 


The method of paired comparison has had a long and honourable history in psychological 
experiments, beginning with the researches of Witmer and Cohn, published in 1894. 
Titchener (1901) described it in detail in one of the earliest text-books on experimental 
psychology and Guilford (1954) devotes a chapter to it in the latest edition of his popular 
text-book. Theoretical investigations of the method, which has applications outside psy- 
chology, still continue to appear, e.g. by David (1959) in this Journal, in technical reports 
by Gulliksen & Tucker (1959) and in a thesis by the author (1960). The authoritative paper 
on the null hypothesis concerning it is the one by Kendall & Babington Smith which 
appeared here in 1939. With this I find myself in disagreement. 

The experimental procedure is to show a set of m objects to an individual in pairs and 
ask him each time to choose one. It is always understood that the objects differ from one 
another, but there may be doubt whether the difference is discernible by the individual. 
The difference may be confined to one respect, e.g. a set of boxes may be used identical in 
appearance but differing in weight, and the observer’s attention may be directed to that 
respect, e.g. by the instruction, ‘Choose the heavier each time’. Or they may differ in several 
respects and the criterion of choice may be left to the individual, e.g. in Titchener’s standard 
procedure the objects are coloured cards differing in hue and saturation and the individual 
is instructed to choose whichever he prefers. It is normally understood, but not always, 
ef. Myers (1925), that each of the 4m(m—1) possible pairs is presented once and once 
only. 

We shall assume that the objects may differ in several respects and that the individual’s 
attention has not been directed to any respect for which there is an independent criterion; 
also that he has been shown every possible pair once and is never permitted to evade 
the obligation to choose, e.g. by responding, ‘Both alike’. The objects will be denoted 
i ee A 

Initially we may hope to show that the individual is aware of one dimension of preference, 
in accordance with which the objects can be arranged in an order from most preferred to 
least. The contrary, C,, which must be disproved before any such hypothesis, H,, need be 
conceded, is that the individual is unaware of any differences between the objects and that 
all his choices are made at random, independently of one another. It may be disproved if 
an unexpectedly large number of the choices are internally consistent, ie. cohere with the 
same one out of all the m! possible orders for m objects; for in the absence of any criterion 
all possible orders are equally admissible. The minimum number of inconsistent responses 
will be denoted by i, and an order with which there are only i inconsistent responses will 
be called a nearest adjoining order. 

In some specimen schedules of responses the nearest adjoining order is not unique; 
there may be several orders, say j altogether, with only 7 inconsistencies. The numbers 
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i and j are always ascertainable, at least in theory, if the schedule is checked against the m! 
possible orders. Consider, for instance, A > B, A < C, B > C, a specimen schedule for 
3 objects (read > as ‘preferred to’). One of the responses is inconsistent with each of the 
three orders A > B > C,B > C > AandC > A > B. The other possible orders, A > C > B, 
B>A>C and C>B>A, can be omitted from consideration for two responses are 
inconsistent with each of them. The nearest adjoining order is not unique so the incon- 
sistent response cannot be identified, but certainly 1 = 1 andj = 3. 

The sample space or universe for m = 3 contains 8 distinct specimens of possible 
schedules of responses; and all are equiprobable on C,. It is easy to verify that six have 
i = 0 and two have i = 1. So a schedule with 7 = 0 is not exceptional when m = 3 and C, 
is tenable for all the specimens in this universe. In general the universe for m objects 
contains 2) equiprobable specimens. Let s, be any specimen with a certain number of 
inconsistencies, i.e. with i = c, and let f,,,(c) be the frequency of occurrence of all such speci- 
mens in the universe for m. The question to be decided is whether C, is tenable concerning a 
particular s, from this universe. If we make it our rule to reject C, when its probability is 
below 0-05 we can reach a decision if we know what is the limiting values of i, say i = u, 
for which 


S fali)/2@) < 0-05. 
i=0 


Then we reject C, if ¢ < u but not otherwise. 

Consider next the s, with c < win different universes, for all of which H, must be admitted. 
As m and consequently u are allowed to increase, c can increase indefinitely without ex- 
ceeding uw. At some point it may begin to seem surprising that so many responses, which can 
be itemized if the nearest adjoining order is unique, are all consistent with one another, 
i.e. with one other ordering of the objects, which may be called the residual order; and we 
may feel tempted to consider the more elaborate hypothesis, H,, that the individual is 
aware of two dimensions of preference. For the objects can be arrayed on a surface definable 
by two axes, falling in the nearest adjoining order along one and in the residual order along 
the other, so that every choice appears consistent with one or other of the two orders. The 
contrary we now encounter, C,, which must be disproved before H, need be conceded, is 
that the choices inconsistent with the nearest adjoining order do not imply any awareness 
of a second dimension but are all made at random and independently of one another. The 
argument of §3 below is that under the conditions of the experiment C, is always tenable, 
no matter how large m and i are. 


The hypotheses under consideration all relate to the E responses the individual is 


«“ 


required to make. Each is the result of a single act of choice, potentially independent of 
every other such act and liable to bring the laws of probability into operation. So I regard 
the responses as the simple events from which the universe for m originates and conclude 
that the probability distribution for i, the number of inconsistent responses, is what needs 
to be examined when C, is under consideration—not the probability distribution for d, 
the number of circular triads in a schedule, which is the variable considered by Kendall 
& Babington Smith. 

A triad is the set of responses relating to three objects. It may havei = 0 or 1, as already 
mentioned, and it is circular when i = 1. The authors only give reasons of simplicity and 
convenience for treating triads as units for enumeration. After describing them and larger 
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polyads within the complete configuration or m-ad which may be used to represent the 
schedule of responses, they remark ‘it seems best to confine attention to circular triads, 
which, so to speak, constitute the inconsistent elements in the configuration, and to ignore 
the more ambiguous criteria associated with circular polyads of greater extent’. They do 
not mention the possibility of treating each response as a unit, nor do they offer any explicit 
definition of the null hypothesis to be considered. 

Triads ought not to be treated as elements. They are compound events not conceivably 
independent of one another, for the total number of triads in a schedule exceeds the number 
of responses by a factor of 4(m— 2) and each response features in m— 2 triads. Moreover, 
there is no 1:1 relationship between i and d; schedules from the same m with the same i 
may differ in d, and vice versa. For instance, 


wheni=1, dranges from 1 to m—2, 
wheni = 2, d ranges from 2 to 2m—6; 


and further evidence appears in Fig. 1 and Table 2. If the inconsistencies were subclassified 
d might be defined as a weighted summation of the frequencies in specified classes: that is 
to say, d may be viewed as a summation in which some inconsistencies receive more weight 
than others. But on the assumption that the inconsistent responses result from erratic 
acts of choice and occur at random there is no justification for subclassifying them. And 
the conditions of the experiment do not include any region where this assumption can be 
proved to have a negligible probability. 

Inconsistent responses receive equal weights in Kendall’s procedure for 7 (1938, 1948), 
so there is a simple relationship between 7 and i. A nearest adjoining order might be defined 
as any order which maximizes 7 for the schedule under consideration, and the maximum 
value of 7 is obtainable from 7, given m, as 1 — 4¢/m(m— 1). 


2. THE FIRST FORM OF THE NULL HYPOTHESIS, CQ, 


On C, when two of the objects, J and J, are presented to the individual, since he is unaware 
of any difference between them but obliged to make a choice, he is just as likely to choose 
I or J, and his choice will be not influenced by any choices he may have made on any previous 
occasion when he may have had J or J presented for comparison with any of the other 
objects. 

The universe of different schedules of responses thtis obtainable consists of 2) specimens, 
all equiprobable. This total needs to be broken ddwn into subtotals for specimens where 
i= 0,1, 2,.... Then to decide whether C;, applies tola particular schedule for m objects we 
need to find the number of inconsistencies in it and see what proportion of the schedules 
in the universe contains no more than the same number of inconsistencies. 

Table 1 shows the breakdowns for m < 8, and the cumulative proportions derived. If 
C, is considered acceptable at the 0-05 probability level but not below, it is tenable for all 
schedules where m < 6, but not when? = 0,m > 6, when? = 1, m > 7, or wheni = 2,m > 8. 

The frequency distribution for any m may be defined as the expansion 


20) = fn(0) + Sn(1) + fin(2) + ++ 
A general algebraic definition of f,,(i) would thus define the complete frequency 


20 Biom. 48 
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distribution for all m. Such an expression has not yet been found, but the expressions 
for values of i up to 3 can be given. They are 


Sm(0) = m!, 
Sm(1) = m! (3m? — 13m + 14)/6, 
fn(2) = m! (9m4 — 78m3 + 235m? — 438m + 680)/72, 
fn(3) = m! (135m8 — 1,755m5 + 8,685m4 — 27,185m3 
+ 77,820m? — 157,204m + 210,336)/6480. 


The expression for f,,(2) only applies when m > 4, and the expression for f,,(3) only when 
m>6. When m = 5, f,(3) = 24. 


, : 3 2 
These expressions give > f,,(i)/2% = 0-009902 
i=0 


for m = 9. So w is certainly not less than 3 when m>9. It may even exceed 3, and ap- 
pears to be increasing at an accelerating rate. 


Table 1. The frequency distribution of i for given values of m 


a m= 2 m=3 m=4 m=5 m = 6 m=7 m=8 
Part 1 

0 2 6 24 120 720 5,040 40,320 
1 — 2 40 480 5,280 58,800 685,440 
2 — — — 400 13,280 278,880 5,120,640 
3 can — — 24 11,568 651,504 21,590,016 
4 — — — — 1,920 736,848 55,101,312 
5 — —- — — — 323,120 84,325,248 
6 — —_ — — — 41,040 71,687,040 
¥ | — — — — — 1,920 27,421,440 
8 — — — — — — 2,464,000 


Cumulative proportions 


Part 2 

0 1-0 0-750 0-375 0-11719 0-02197 0-002403 0-000150 
1 — 1-0 1-0 0-58594 0-18311 0-030441 0-002704 
2 — — = 0-97656 0-58838 0-163422 0-021780 
3 — —_ — 1-0 0-94141 0-474083 0-102209 
4 — — — — 1-0 0-825439 0-307477 
5 — — — — — 0-979515 0-621613 
6 = —- — — — 0-999084 0-888668 
7 — — — — — 1-0 0-990821 
8 — — — — — — 1-0 


The data for m > 6 have been provided by the National Physical Laboratory using an 
electronic computing programme developed by G. G. Alway as a research project. An 
account of it will be published separately. Considerable expense would be incurred if 
the research were continued to obtain complete expansions of 2@) for larger m or expres- 
sions defining f,,,(i) for larger 7, so that it seems desirable to publish the present results and 
to ascertain the consensus of expert opinion before proceeding. For most practical purposes 
it would be sufficient to know the values of u for m < 15. It is true that experiments have 
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been conducted with considerably more than 15 objects. Titchener (1901) regularly used 
one with m = 27 for his course in experimental psychology and Cattell, Maxwell, Light & 
Unger (1949) have described one with m = 50. Experiments with large m need not be 
difficult to conduct if the objects are suitably chosen and appropriate apparatus is con- 
structed for presenting them in pairs and recording the responses automatically, but it is 
not often that any compelling reasons for conducting such experiments are encountered in 
practice. 


3. THE SECOND FORM, C,; AND AN ARGUMENT THAT EVERY INCONSISTENCY 
SHOULD BE GIVEN AN EQUAL WEIGHT 


The evidence of a single schedule of responses is never sufficient to make C, untenable. 
Take A > B >... > M arbitrarily as an order with which some of the individual’s responses 
cohere. Then the remainder must all cohere with the opposite order A < B<... < M. 
If none are consistent with the first all are consistent with the second. So the i responses 
inconsistent with the nearest adjoining order must a fortiori all be consistent with one 
another, and in general may be linked together in many different ways to form possible 
residual orders. In other words evidence in favour of H, is indistinguishable from 
evidence against it, so C, cannot be ruled out. If we wish to disprove C, we must adduce 
supplementa: , evidence from other sources, modify the conditions of the experiment in 
some way or advance some specific argument. 

Thus the only alternatives to be considered when investigating the internal consistency 
of a single schedule of responses are 

(i) C,: the observer is unable to discriminate between the objects, or 

(ii) H,+C,: he is aware of a single dimension of preference. Choices not made in accord- 
ance with it are produced by chance causes, i.e. causes operating independently on particular 
judgements, such as distractions or momentary lapses of attention, etc. 

There is never any case for pressing on to consider alternatives such as might be denoted 
H, + H,+C3, etc., without additional evidence. 

We may argue from this that every inconsistency should be given an equal weight when 
C, is under consideration. If the order A > B >... > M is the dimension of preference 
characteristic of the individual, a cause operating accidentally is just as likely to produce 
the reversal A < M as A < D, say, and as no causes other than accidental causes need be 
supposed, we ought not to assign more weight to one such reversal than another. A straight 
count, that is to say, an unweighted summation of the inconsistencies is therefore the index 
we should use in deciding whether H, or C, is to be preferred. 

The proposition can be sustained, perhaps quite adequately, without reference to C,. 
For per contra we cannot claim that A < M should be given a greater weight than A < D 
without postulating that A is further removed from M than from D on the scale of prefer- 
ence characteristic of the individual. But this is to concede a form of H,, and we should not 
make any such concession before we have succeeded in disproving C,. Moreover, the relative 
weights we assign to A < Dand A < M must depend on the particular form of H, we choose 
to concede; but even after disposing of C, we may be left with j > 1, i.e. with several equally 
acceptable forms of H,. 
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4. COUNTING 7 IN A SCHEDULE OF PAIRED COMPARISONS OBTAINED EXPERIMENTALLY 


Mr Alway has kindly contributed the following practical notes: 

No simple rule for obtaining 7 is known to be applicable in all cases, but in all practical 
applications encountered so far the following simple rules have sufficed. 

First re-order the rows and columns of the preference matrix* so that the numbers of 
+’s in successive rows are in descending order. Next, examine each row to the right of the 
diagonal element, and proceeding element by element count separately the positive and 
negative ones. If at any stage the number of negative elements exceeds the number of 
positive ones the matrix may be transformed to decrease to total number of negative 
elements in the upper triangular half. For example, if one of the rows (starting with the 
diagonal element) is veglaity gluing ar geal 
then by placing this row and the corresponding column nine places further on, the total 
number of negative elements in the upper half is reduced by 1. The resultant matrix can 
be examined again in this way. The columns should also be examined in a corresponding 
fashion; this is the same as examining the rows starting with the diagonal element and 
counting backwards towards the first element. The process should then be repeated, and 
wherever the count of —’s equals the count of +’s the matrix should be transformed in a 
corresponding fashion. This change will not of itself reduce the total number of —’s, but 
it may alter the position of certain elements so that the number of —’s in some row or 
column exceeds the count of +’s, and then the total number may be reduced. This process 
has sufficed in all practical applications (m < 10) to reduce the number of inconsistencies 
to its lowest value, and also to give all the permutations for which this lowest value is 
attained. 

Even when the first rule, to put the number of +’s in successive rows in descending 
order, is omitted, the simplest case in which the second rule, of counting +’s and —’s by 
row and column, fails by itself to give # is 


++ —-+-4+ +4 
-. +++ 4+ 4+ 4+ 
--.++4+-4+- 
+--. +++ 4 
ee heal gr a hl 
+-+-- + + 
Se daa ee a 
ee ae ee 


for which i = 4, given by the permutation (24618357) of the rows and columns. 

The calculation of the data mentioned in §2 and the writing of this section have been 
carried out as part of the research programme of the National Physical Laboratory and they 
are published by permission of the Director of the Laboratory. 


* A preference matrix for recording the responses in a schedule is an m x m table with a row and 
corresponding column for each object. The response J > J is recorded as +1, or simply +, in row I 
column J, and as — 1, or simply —, in row J column J. 
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5. KENDALL’S d AND ITS RELATION TO 1 


When one of three conjoined choices is inconsistent with the other two the triad formed is 
circular. Kendall & Babington Smith’s procedure depends on finding the number, d, 
of such triads in a schedule. The simple computing method for finding d is a great advantage 
of their procedure. Counting the number of +’s in each row of the preference matrix A 
provides a column vector a which is a partition of 4m(m—1); and d can be obtained from 


2d = m(m — 1) (2m—1)/6—a’a. 


A, a, A, a, A; a; 
+--+ + 3 -+t++t++—- 3 +-++ 3 
-.+++ 3 -.+++ 3 -.++++ 3 
+ - ++ 3 -- ++ 2 + - +- 2 
---.+ 1 ---.+ 1 ---.+ 1 
----. 0 -~- == 1 Sp ate See. ge 


Fig. 1. Three preference matrices, and their partitions. 
For example, the three matrices in Fig. 1 with partitions as shown have 


a’a 


d a j 
A, 28 1 1 3 
A, 24 3 1 1 
A; 24 3 2 5 


It appears debatable whether the inconsistency in A, should be given more or less weight 
than the one in A,. For instance, it might be argued from j that the one in A, is the more 
reasonably attributable to some accidental cause, as it does not evoke any doubt about 
what order represents the individual’s characteristic dimension of preference. My view, 
based on the argument in § 3, is that both inconsistencies should receive the same weight. 
Kendall’s procedure weights the inconsistency in A, three times as heavily as the one in A,. 

Moreover, Fig. 1 shows that no simple relationship exists between i and d: A, and A, 
have the same i but a different d, A, and A, have the same d but a different 7. Table 2 shows 
the relationship between 7 and d in the universes form < 8. The two quantities are quite 
closely correlated, viz. 


In the universe for ris 
m=4 0-9317 
5 -9087 
6 -9031 
7 *8969 
8 *8927 


It is not surprising that r diminishes as m increases. Increasing m provides more freedom 
for preference matrices with the same partition to vary in the internal arrangements for 
their +’s and —’s. 

The correlation is not close enough to prevent different results being obtained when d 
and i are used to test C, with reference to a single schedule of responses. When m = 7 
d leads to the rejection of some A’s at the 5 % significance level where i = 1 and the accept- 
ance of others where i = 2; and when m = 8 to the rejection of some where i = 2 and the 
acceptance of others where i = 2 or even 4. The 5-85 million possible schedules acceptable 
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in accordance with i and the 9-92 million acceptable in accordance with d when m = 8 
include 4-80 million in common. There is disagreement about the remainder. Part of this 
disagreement arises because i and d are discrete variables, so that the tails of their probability 
distributions cannot be cut off exactly at the 0-05 level. In percentages, 2-18 pass the test 
on i, 3-70 pass on d, 1-79 pass on both. 

When several individuals, say n altogether, are asked to compare the same m objects 
in pairs and little evidence of agreement is found between them, the question may arise 
whether the absence of agreement reflects differences in taste or lack of discernment. It 
should be possible to extend the use of i or d to consider problems of this kind, and the 
correlation between them should be sufficient to lead to convergent conclusions when n is 
not too small. For n above a certain limit the advantage of easy computation might tell 
decisively in favour of d. 

I would like to emphasize the importance of distinguishing between problems of discern- 
ment and problems of agreement in this context. Comparison in pairs is specially appro- 
priate for problems of discernment; it provides more evidence of internal consistency, or 
the lack of it, than comparison in sets of more than two at a time. But for investigating 
problems of agreement it does not appear to have any advantages over other methods of 
multiple comparison, of which ranking is administratively the most convenient. 
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Some circular coverage problemst 


By GEORGE W. MORGENTHALER 
The Martin Co., Denver, Colorado 


1. THREE COVERAGE PROBLEMS 
The following three coverage problems are of mathematical and military interest. 
Problem 1. A damage circle of radius R is aimed at the centre, (0,0), of a circular target 
region of radius Z (Fig. 1). What is the expected or average coverage of the target circle if 
the impact points (x,y) are circularly normally distributed, with standard deviation o? 


y 


A(x, y) 





Random impact 
point (x, y) 


Aim point 








Fig. 1. Problem 1. Coverage by one aimed circle. 


Problem 2. A cluster of n circles is aimed at O = (0,0), the centre of the target circle of 
radius Z (Fig. 2). If the cluster circles are non-overlapping—each of radius R—and if 
the centre of the cluster is circularly normally distributed about O with standard deviation o, 
what is the expected coverage? 

Problem 3. A set of n independently aimed circles are impacted at the target circle of 
radius Z (Fig. 3). The ith of the impacting circles is of radius R and is aimed at an aim point 
A; = (a;,;) (¢ = 1, 2,3,...,”), each impact being circularly normally distributed about its 
aim point with standard deviation 7. What is the expected coverage? 

These three problems will be considered in the section below, which will include a sum- 
mary of the rather scattered work of a number of authors and also, it is believed, some new 
results under the following headings: 

(a) Introduction of the diffused exponential bomb (covering circle) as a means of arriving 
at approximate solutions to the offset circle v. circle coverage problem. 

t A partial summary of this paper was presented at the Seventeenth Annual meeting of the Opera- 


tions Research Society of America in New York, 21 May 1960. The paper is based on Martin 
Company Report P-59-69. 
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(b) Use of the above offset circle v. circle diffused bomb model to solve the cluster-of- In 
circles coverage problem. solve 
(c) Use of the above offset circle v. circle diffused bomb model to solve the problem of to A( 
several individually aimed circtes v. a target circle. | devic 
prob: 







(x, y) random impact 
of centre of mass ) 


Aim point 





0 





Fig. 2. Problem 2. Coverage by a cluster of circles. 
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Fig. 3. Problem 3. Coverage by independently aimed circles. 


2. EXPONENTIAL MODEL FOR EXPECTED COVERAGE WITH SINGLE CIRCLE 


Using the ‘cookie-cutter’ or circle of damage approach, the expected damage, ZH, may | 
be written as 1 


where A(z, y) is the overlap when impact is at (7, y) (see shaded region in Fig. 1). 
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In Report P-59-69 of The Martin Co., the author has traced the interesting attempts to 
solve this problem at the Rand Corporation and elsewhere. These include approximations 
to A(x,y), use of squares or rectangles in place of circles, and special analogue integrating 
devices. Germond (1950) has provided a solution in terms of the so-called offset circular 
probability function, as follows. Let a? +b? = r?, and 


1 
R = — —}(x2+y?) } 
p(k, r) “5 | SOO, +v dady 
ame FE : . 
| 4{(E +a)? + (9 +6)*}] dEdy. (2) 


Then p(R, r) is the probability that a random point circularly normally distributed about 
the origin with unit variance will fall within a circle of radius R about point (a,b). This 
function and its complement, q(R,r), are called offset circular functions. g(R, 7) can be shown 


to be equal to ‘ 
q(R,r) = ces R| eR*+") T (Rt) dt, (3) 
0 
where J,(Rt) is the first-order modified Bessel function of the first kind. The National 
Bureau of Standards and the Rand Corporation have prepared tables of the function q(R, r) 
in increments of 0-1 in the arguments (The Rand Corporation, 1952). Weil (1954) and 
Laurent (1957) also describe methods for finding the probability of being within a distance 
R of (a, b). 
Using p(R,r), Germond observed that the function 


R 
S(Z, R) = 2n i, pp(Z,p)dp, (4) 


which provides the solution of Problem 1, is equal to the expected overlap when a circle 
ofradius R is dropped over a circle of radius Z if the point of aim is the centre of the stationary 
circle and if the impact distribution for the centre of the damage circle is circularly normally 
distributed with unit standard deviation. Germond expressed S(Z, R) in terms of the offset 
circle function p(Z, R) and the Bessel functions [,(ZR) and [,(ZR). A table of the relative 
target coverage, S(Z, R)/(7Z*), is found in Germond (1950, pp. 15, 16). The table is in jumps 
of 0-5 at first in Z and then in unit jumps, and in 0-1 increments of R. 

Von Neumann, Thompson (1958), and others utilized a ‘diffused target’ idea with success 
in such problems. Here we shall diffuse the damage function. Instead of 


‘ ‘ l x—£)%+ — 2 < R2, 
P{damage of point (£, 7)|impact at («,¥)} = | — Piste (y—%) 


define P{damage of point (£, 7)|impact at (x, y)} = exp _& _ — w) , (5) 


B being a constant. 


The probability of damage to point (£,7), given that (x,y) is circularly normally dis- 
tributed about the origin with variance o?, is then 


f(é, 9) = sat | et exp ( ie) exp (- “e) dxdy 


P+ 9° ). 


B 
~ Gay BOP 3 2(02 + B®) 6) 
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Table 1. B/Z as a function of R/Z 


R 

Z 
0-3919 
+3844 
+3772 
+3702 
+3635 


0-3571 
+3508 
+3448 
+3390 
+3333 


0-3279 
+3226 
+3175 
+3125 
*3077 


0-3030 
+2985 
+2941 
+2899 
+2857 


0-2817 
+2778 
+2740 
-2703 
*2667 


0-2632 
+2597 
-2564 
+2532 
+2500 


0-2469 
+2439 
+2410 
+2381 
+2353 


0-2326 
+2299 
+2273 
+2247 
*2222 


0-2198 
*2174 
"2151 
+2128 
+2105 


0-2083 
+2062 
+2041 
*2020 
+2000 


B 

Z 
0-2773 
+2720 
+2668 
*2619 
+2571 


0-2525 
+2481 
+2438 
*2397 
+2357 


0-2318 
+2281 
+2245 
+2210 
+2176 


0-2143 
*2111 
-2080 
+2050 
+2020 


0-1992 
+1964 
*1937 
“1911 
-1886 


0-1861 
*1837 
-1813 
-1790 
+1768 


0-1746 
*1725 
+1704 
*1684 
-1664 


0-1644 
+1626 
+1607 
*1589 
“1571 


0-1554 
+1537 
*1521 
+1504 
-1489 


0-1473 
+1458 
*1443 
-1428 
*1414 
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9-10 
9-15 
9-20 
9-25 


9-30 
9-35 
9-50 
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12-00 


15-00 
20-00 
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Z 
0-0937 
-0930 
-0924 
-0918 
-0912 


0-0907 
“0901 
-0895 
“0889 
-0884 


0-0878 
-0873 
-0868 
-0862 
*0857 


0-0852 
-0847 
-0842 
0837 
-0832 


0-0827 
-0822 
-0817 
0813 
“0808 


0-0804 
“0799 
-0795 
-0790 
-0786 


0-0781 
0777 
‘0773 
-0769 
-0764 


0-0760 
-0756 
-0744 
-0707 
-0589 


0-0471 
0354 
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If f(&,7) is now integrated over the target circle of radius Z, and divided by 7Z?, the 
expected fraction of the target damaged, H*, is obtained as 


be = 2(5) {1-exp(-a53---p5)}- (7) 


To use the approximate formula (7), a method for determining the constant B must be 
specified. B will be chosen so that the expected damage when placing the diffused damage 
function at the origin (without random impact) is the same as when placing the cookie- 
cutter function at the origin. Assume that R < Z both now and throughout the remainder 


of this paper. Then rey 
nh = - {J exp (- me d 
tL y<Z ape ) ee 


and hence R? = 2B%(1 —e—427/B?), (8) 


Using this result, B/Z can be determined from #/Z or we may write B = Zf(R/Z); numerical 
results are given in Table 1. 


Table 2. EH* as computed by exponential approximation and by 
‘cookie-cutter’ (Rand Table) for Z = 1 


E* from 
E* from equations 
o R/Z Rand table (7) and (8) 
0-1 0-2 0-0400 0-0400 
4 -1600 -1596 
6 -3600 -3560 
8 -6385 -6330 
9 -7942 +8135 
-999 -9203 -9984 
0-2 0-2 0-0400 0-0400 
4 -1599 *1577 
6 +3574 +3436 
8 -6106 -6120 
9 -7356 -7976 
“999 *8412 -9982 
0-5 0-1 0-0086 0-0087 
2 -0342 -0337 
3 -0756 -0735 
6 +2766 +2632 
8 -4464 *4947 
-999 -6142 -9969 
1-0 0-2 0-0156 0-0155 
“4 -0611 *0594 
6 +1323 +1334 
8 +2231 +2899 
9 *2737 -4896 
“999 +3263 -9923 


' 


Because it is intended that this exponential approximation be used in the other two 
coverage problems described in § 1, it is interesting to determine the adequacy of the approxi- 
mation. In Table 2, the exponential approximation given in formulae (7) and (8) is compared 
with the value of Z* obtained from the tables of S(Z, R)/(7Z*) found in Germond (1950). 
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The data indicate good agreement for R/Z < 0-8 and for values of o such that o/Z < 1. 
Agreement is generally better for smaller o’s and R/Z values, such as will prevail in the 
cluster and salvo problems. 

In the cluster problem (Problem 2) and in the independently aimed salvo (Problem 3), 
the exponential approximation will be replacing a ‘cookie-cutter’ damage circle that is 
aimed off-centre. We may ask how adequately the approximation will fit in this case. The 
formula (7) is easily extended to give the coverage of an impact circle aimed at an offset 
point (a,b). In place of (6) we have 


B lela alae 
= aie? (- ens) (9) 


The fraction of coverage, Z*, is then obtained by integrating f(£, 7) over the circle of radius 
Z and dividing by 7Z*. In fact, using the definition of equation (2) for p(R,r) we have 





E* = 2(Z) v(Rr), (10) 


where R? = Z?/(o7 + B*),  r? = (a® + b*)/(o? + B?). 


As mentioned earlier, p(R,7r) is tabled in Rand Corporation (1952), and hence H* can be 
computed. However, there appear to be no tabled values for this offset circle v. circle 
problem against which the values of H* computed by (10) might be compared. In view of 
this, several hand-run Monte Carlo trials were performed. 

Using the Rand set of random normal deviates (Rand Corporation, 1955) and large 
paper with a grid of lattice points, four off-centre circle v. circle coverage experiments were 
tried. The Z* obtained is compared in Table 3 with Z* as computed from formula (10). 
The estimation of overlap area for each impact was obtained by counting the number of 
lattice points covered in the target circle. In addition, four Monte Carlo trials (indicated 
by p) were conducted in which the overlap area was planimetered. 


Table 3. Comparison of computed E* with Monte Carlo E* for eight cases. 
Target circle of radius Z = 1 


Offset No. of Monte Carlo E* computed 
distance R o R/Z trials E* by (10) 
0-33(p) 0-50 0-08 0-50 100 0-250 + 0-001 0-237 

-33(p) “50 “50 -50 75 189+ -019 -173 

-50 40 “17 40 75 *152+ -008 *144 

-67(p) 33 -06 +33 100 108+ -001 -099 

“67 60 17 “60 25 *264+ -026 256 

-67(p) -50 “33 50 50 “170+ -021 163 

83 +20 17 -20 50 030+ -004 030 
1-17 +25 +33 +25 50 023+ -006 017 


The tolerance indicated about the Monte Carlo estimates of Z* are based upon the sample 
range (see Burington & May, 1953, p. 167). These tolerances show the uncertainty due to 
Monte Carlo, but not the error in determining overlap area. The indicated tolerance in 
Table 3 is +3s/,/N, where N is the number of Monte Carlo trials and s is the estimated 
standard deviation. 
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Assuming an independent elliptical normal distribution of impact points (x,y) about an 
aim point (0, 0) with standard deviations, ,, and o, in the x and y directions, one can obtain 
the following for the exponential damage function 





B ll 7 
f(é, 9) = V(o2+ - B?) ) (a2 +. B)° xp{— acm ap): 
Then this gives 


r-2Macercees nee Yamato 


The expression in braces involves the integration of an peel, normal function over 
a co-centred circular region. Approximations for this expression have been found by Oberg 
(1947); Harter (1959) has provided tables. Fettis (1957, p. 13) has shown that upon setting 


Z o2 + B 
= aeBy °~ ote) 


(and switching roles of 7, o,, if o,, > o,,) the probability mass over the circle of radius Z is 


[eta(e-")-a(e*4)} -ala(c+2)-a(c-1)}} 02) 


where q(R,1r) is the offset circle probability function previously defined. 








3. MODELS FOR CLUSTER COVERAGE (PROBLEM 2) 


A. Replacement of cluster with an annulus 


It seems reasonable to replace the cluster of non-overlapping damage circles with an 
annulus of equivalent area. This automatically includes any rotation of the cluster, i.e. 
variation in 0 (Fig. 4). In the examples below, the mean annulus radius is made to coincide 
with A, the radial distance from cluster centre of mass to the centre of any cluster circle. 

If there are n circles in the cluster and 2A is the annulus width, the equation for equating 
damage areas and determining A is 

nnk? = n[(A+A)?—(A—A)*] or A =. (13) 


Using the exponential approximation twice, we may approximate the expected coverage by 


er = 2{(3) (1-e-{-xe2-n9))-(Z) (!-e[-aeremm))} 9 


i (nR?/ a) 


where B, = 2g(“ a eas - ap(*= ——_ 


? 


the function f being defined on p. 315 above. 

Table 4 shows the result of some Monte Carlo trials in cases that were run to check the 
accuracy of formula (14). In each case a cluster of four tangential damage circles of radius 
R = ,/0-4 were employed. This meant that A = 0-4. Z was chosen as 1. The uncertainty given 
for the Monte Carlo value is again computed as + 3s/,/N, s being estimated from the sample 
range. Lattice-point counting was the basis for estimating three areas of coverage and may 
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have introduced some error. In addition, two Monte Carlo trials (indicated by p) were run 
with overlap measured by planimeter. Agreement is just about acceptable. However, it 
appears that the formula is biased in giving estimates that are about 5 to 9% low. This 
same phenomenon, already detectable in Table 3, will be seen to occur again in other 
approximations involving the exponential expression. Perhaps it is characteristic of the 
spreading out of the damage mass, and could be adjusted for by a constant. 





Fig. 4. Replacement of cluster of six circles by an annulus. 


Table 4. Comparison of five Monte Carlo cluster trials with the 
exponential model (n = 4, A = 0-4, R = /0-4, Z = 1) 


Value of H* Value of H* 


No. of Monte Carlo computed by computed by 

o trials value of £* formula (14) formula (22) 
0-20(p) 50 0-317 + 0-004 0-302 0-311 
*25 30 308+ -012 +294 +303 
40 40 ‘276+ -020 +257 267 
60(p) 50 *221+ -030 +202 206 
80 30 "1634 -042 153 “154 


B. Rotating cluster model 
If the centre of mass impacts at («, y), a fixed damage circle will impact at 
(x+Acos),y+A sin 0). 


Then P{(£,7) is damaged|impact of centre mass at (x, y)} 


( —2—Acos 0)? + (y-—y—Asin 0)? 
= exp(— x—Acos zea y—Asin *) 











and 


FE, ' 
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and 
me ll (€ —x—Acos 0)? + (y—y—Asin 6)? at+y2 
f(&.9,A, 4) = mal. [7 exe(- oR )exp(- 52 ) deay 
B* (§ —A cos 6)? + (y—Asin 9)? : 
=a xp (—' 2(o? + B?) )}. (15) 


If @ is uniformly distributed, the average damage for the circle is 


B C+ P+ S 1 
f(E,9,A) = Fatat +B)? (— vary) mi (asm) * [EcosO+ysinO}y"dé. (16) 


In the expansion, set m = 2j and use the following from Watson (1952, p. 21) 


iS 0 if m is odd, 
i [EcosO+ sin O}"dd = 21(3(m+1)) (3) po, i... 
0 TG@m+2) (€2 + 9?) m even. 


Then, 








RB? N24 E2492) © 1 r —— = 
r ee a - é 24 2 
f(é, q, ) wat Ba? ( ery) 2 9 (29)! (<i) M(j+ [o°+ ia 
B 2 ‘ +9 
sas tee 21 92 7 
aR? (— are B) 2 z(; si} (; aa) eta nak seer)" a7) 
To find the expected damage from this particular damage circle, f(£,7,A) must be 


integrated over the target circle of radius Z. To include the effect of n damage circles with 
damage densities /,, f,, ..., f,,, find an equivalent damage density (assuming independence) 


E,9A) = 1— TE (L-flE) (18) 


and integrate this over the target circle. 

However, the actual physical cluster problem does not allow any overlap of damage 
circles. The exponential approximation, while it does have overlap, has the impact centres 
so far apart in the cluster case that the product in (18) is nearly always close to the particular 
value (1—/;) of that damage circle nearest to (£,7). We shall assume, therefore, that 


HE.7.A) = ¥ AEA) 


and hence 


r= gl) (sam) (3) (am) 


«la i) “yt ZP E orc) OP (- seas) aay) - (19) 


The integral in braces can be obtain by a recursion formula. Denoting the term in 
braces by g;(Z), we find 


$23/(0°+ BY) 
g;(Z) = 2(o%+ BA) | we“ du. 
0 


Setting h,(Z) = [2(c?+ B*)}-*g,(Z) and integrating by parts, 


: Zz j Zz 
h,(Z) = jh;_(Z) —_ (xctczs) exp (- 2(o2 +B) e (20) 


ar Biom. 48 
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In particul a 
n particular h,(Z) an —exp (- xtra) . (21) 


Using the definitions (20) and (21), (19) takes the form 


B\2 r2 © /}\2 r2 j B\2 
* = As hs eee Sun SVR: FOES * aes y 
Bra) = 26(7) ex (or ae)) 3, (ji) (sioreae) M@<2m(z) 
Although this formula involves an infinite series, it is easy to use in computation. In most 
cases, after six or so terms, the additional terms may be neglected. 
The last column of Table 4 presents the comparison of five estimates produced by this 
model with the Monte Carlo values. The agreement is better than for the annulus model. 


4. MODELS FOR SALVO COVERAGE WITH A SPREAD OF AIM POINTS (PROBLEM 3) 


When several damage circles are independently aimed at the target circle, each with its 
own aim point, the problem of estimating damage is complicated by overlapping (Fig. 3). 

Garwood (1947) found the variance of the overlap when circles are aimed at a target 
circle, the centres of the circles being uniformly distributed over a larger circle containing 
the target circle. Bronowski & Neyman (1945) found the mean and the variance of the 
measure of the area covered by randomly tossing a rectangle n times at a target rectangle. 
Robbins (1945) has considered the analogous problem for a random sum of n-dimensional 
integrals in n-space. Walsh (1960) considers selecting an optimal aim point for a salvo of 
n rounds fired independently at the aim point so as to maximize the kill probability. Ger- 
mond and Dishington have considered this problem in several Rand Corporation reports. 

The damage density function due to the circle aimed at (a;, };) is approximated by 


(6 —a,)* ee) 


B 
fé, 1, 4, 0;) = ape? (- 2(o2 + B2) (23) 


The damage probability of point (£,7) because of effects of the n independent circles is 
then approximately given by 


= 1 Bt (§—a,)? + (9 —5;)? 
én) = 1— Th aay peer (~ SS  s) 


Multiplying out and making an obvious partitioning in the exponents 


v5 anf St-Olin)s5- Olen) 


all Re Y" 2(0? + B?) 


1<i,< ig... <ig 


x exp | - ( " 2a “) id (n - 3% | . (24) 


2(o? + B?)/s 














n = B2 
60) = 3(-1 (So 





The expected damage is now obtained by integrating over the circle of radius of Z. 
Division by 7Z? then gives E*: 


Qn B2 8 o? + B? n 
i. — —1)s-1 
m2 (arn) ( 8 Jen x 


1<i,<%...<%, 


x exp | 2a + (;) (Za) +2 om (; ) (3%) | 








P(R,, 161, bap «onste)? (25) 
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where 8 2 8 2 
sZ ( a,) + (= >) 

R? = , re, ee. SS 

8 o?+ B2 Us Ug, eeor bs s(o? + B?) 


The functions p(R,,7;,;,,..,;,) are the offset circular probability functions defined in 
equation (2) which may be determined from the Rand Corporation Tables (1952). 

Some remarks are necessary concerning the value of B in (25). The expression (8) that 
related R, Z, and B was basically a demand that the ‘cookie-cutter’ and the diffused 
damage function achieve the same expected damage. Because only one damage circle 
was involved, there was no question of overlap. Also, in the case of the cluster, the 
individual circles were so far apart that a value for B estimated for each circle as though 
it were the only circle, seemed to be sufficiently accurate. If the aim points (a;,6;) are 
chosen so that little expected overlap occurs, the value of B given by (8) will again be 
satisfactory. 


Table 5. Comparison of Monte Carlo salvo trials with the exponential model (25) 
(n = 4 circles, p = 0-80 for lattice-point cases) 


No. of 
Monte E* com- 
Aim Carlo Monte Carlo puted from 
R/Z Z o R points trials value of E* (25) and (8) 
0-258 5 1-5 1-29 (2-5, 0) 20 0-181+0-017 0-184 
(— 2-5, 0) 
(0, — 2-5) 
+324 5 1-5 1-62 (2-5, 0) 20 *271+ -027 -272 
(0, 2-5) 
(— 2-5, 0) 
(0, — 2-5) 
-40(p) 5 1-0 2 (2-5, 0) 30 “543+ +042 -569 
(0, 2-5) 
(— 2-5, 0) 
(0, — 2-5) 
*252 10 1-0 2-52 (5-75, 0) 20 *203+ -0004 201 
(0, 5-75) 
(— 5-75, 0) 
(0, — 5-75) 
+30(p) 10 2-0 3 (5, 0) 30 327+ -013 +339 
(0, 5) 
(—5, 0) 
(0, —5) 


In the extreme case when (a;,b;) = (0,0) (¢ = 1,2,...,”), the expected damage when n 
‘cookie-cutter’ damage functions are centred at the origin is only 7R*. Using diffused 
exponential damage functions, however, the equation for determining B is 


ot Afaaalb-e(-Sit} 


If a table similar to Table 1 were made relating R, Z, and B using (26), it would be possible 
to check the exponential approximation of formula (25) against Dishington’s (1956) tables. 


21-2 
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Formula (25) with B estimated from (8) was used to compute the expected fraction of 
coverage in several examples, and the computed values compared with the results of some 
available Monte Carlo trials. These data appear in Table 5, the intervals of uncertainty being 
+ 3s/,/N, as in the earlier tables. Lattice-point counting was used to estimate areas in three 
cases, and two cases were planimetered (indicated by (p)). 

The lattice-point Monte Carlo data include a nonabort or reliability factor: p = 0-80. 
Expression (23) can be multiplied by p to give the damage probability at (&,7), also con- 
sidering reliability. Then (25) would have 


B2 8 pB 8 
_ -1 —7])\s-1 = 
(—1)8 (2 <3 ) replaced by (-—1) (. “ 7) ; 


Agreement in Table 5 is probably well within input inaccuracies for any practical applica- 
tions of the model. Linear interpolation of offset probability function values from tables 
of Rand Corporation (1952) was used. 
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Ordered tests in the analysis of variance 


By D. J. BARTHOLOMEW 
University College of Wales, Aberystwyth 


1. INTRODUCTION 


In the analysis of variance for a one-way classification it is customary to test the hypo- 
thesis that the samples have come from populations with the same mean. The object of the 
present paper is to discuss the best way of making this test when the unknown population 
means are subject to order restrictions. A general theory of tests of homogeneity for means 
under ordered alternatives has been developed in three earlier papers (Bartholomew, 
1959 a,b, 1961): these are referred to in this paper as I, IT and ITI, respectively. However, 
apart from a brief discussion in I, this earlier work relates to the case where the population 
standard deviations are known a priori. This is a natural assumption to make when in- 
vestigating the theoretical properties of tests for means but is unrealistic in many practical 
applications. In particular, in the analysis of variance the standard deviations are not 
usually known, although they can be estimated from the data. In §2 the ¥?-test, first 
introduced in I and extended in ITI, will be generalized to cover this case also. The tests 
based on scores discussed in III may also be adapted for use in the analysis of variance. It 
will be shown that a close link exists between the latter and the distribution-free test pro- 
posed by Jonckheere (1954). A number of factors must be taken into account when choosing 
a test of which power is one of the most important. Asymptotic results have been obtained 
for all the tests mentioned above and they are used in §3 to make power comparisons. 

It is instructive to examine more closely the kind of practical situation which gives rise 
to ordered alternatives in the analysis of variance. The possibility of ranking the class means 
implies that they correspond to different levels of one or more underlying variates. To take 
a simple example, suppose that there are k classes with means /1,, /Mo, ..., 4;, With the structure 

A, = at+fe, (¢ = 1,2, ...,&), 

where a and £ are unknown constants and 2, 9, ..., 2, are independent variables for which 
ranks only are available. The test for equality of means is also a test that /, the regression 
coefficient, is zero. The ranking of the z’s enables us to rank the y’s. Regression models 
involving more than one independent variable, which can be ranked but not measured, 
will lead to examples of partial ordering among the y’s. This interpretation is not, of course, 
confined to linear regression models. Suppose that mw; = f(x;) (i = 1,2,...,4), where f(z) 
is a monotonic (increasing or decreasing) function of x. This function determines a ranking 
of the ’s and we may use our theory to test the hypothesis that f(x) is constant. The theory 
developed here may thus be looked upon as a link between the analysis of variance and 
regression analysis. 

The social sciences provide a fruitful source of applications for ordered tests. Variables 
such as ‘intelligence’, ‘social class’ and ‘degree of satisfaction’ do not readily lend them- 
selves to precise measurement but can often be ranked without difficulty. The theory 
presented in this paper thus provides a useful means of testing hypotheses about regression 
models without the necessity of finding a suitable scale of measurement for the independent 
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variables. In practice, of course, the testing of hypotheses is only a preliminary step which 
serves to select the most promising variables for further study. For the later stages of the 
analysis it is highly desirable that a suitable means of measurement should be devised but 
it is not necessary for making progress in the early stages of the investigation. 


2. Tue £2-rest 


2-1. Derivation of the test. Suppose that we have k groups of observations such that the 
ith group consists of n; members drawn from a normal population with mean y, and 
standard deviation o (i = 1,2,...,4). Let the jth observation in the ith group be denoted 
by x,; and the mean of the ith group by Z,. It is assumed that we have prior information 
which specifies restrictions of the form jy; > 4; between some or all of the y’s. If we now 
attempt to apply the ¥?-test, as described in III, to the k means %,, Zs, ...,%,, we immediately 
meet a difficulty. The weights of the observations are n;/o? and hence X? depends on the 
unknown o. One way of overcoming this difficulty is to use the likelihood ratio principle 
to derive a new test treating o as unknown. This leads to the criterion 


_ k ee kom o 
BH? = Yn(d,—z)?/ Xo & (e%y—%)*,* 
i=1 i=1j=1 
km k 
where = Yry/ UM% 
i=1j=1 ‘| i=1 
and /;, fig, ..., 4, are the maximum likelihood estimates of 11, Mg. ..., 4. The latter can be 
obtained in the same way as when o is known because the weights are proportional to 
M4, Ng, --.,,, and the y’s depend only on the relative values of the weights. As explained in I, 
the averaging process by which these weights are obtained is equivalent to the pooling of 
certain groups resulting in a ‘reduced’ problem of | groups. If the means of these groups 
are denoted by X,, X.,..., X;, and the corresponding sample numbers of the pooled groups 
by N,, N, ..., N, then we have 


= l — ken 
B= y N(X,-XP/S ¥ (@ey-B, (1) 
d i=1 i=l] j=1 
apt 1 1 
where X= ¥N,X,/ EN, =2. 
i=] i=l 


£? is thus the ratio of the ‘between’ groups sum of squares to the total sum of squares for 
the reduced problem. Large values of H? indicate a departure from the null hypothesis. 


2-2. Distribution of H*. The null hypothesis distribution of H? may be readily obtained 
l a 
from results already established. The conditional distribution of 5 N,(X;—X)?/o?, for given 
i=1 


L, is that of x? with /— 1 degrees of freedom (by the theorem of § 3-1 in IIT). Standard theory 
then gives the conditional distribution of Z? as a f-variable with parameters }(/—1) and 


l 
(N —1), (wv = > x). The probabilities p(l,k), defined in I, depend only on the relative 
i=1 


weights and are therefore independent of o?. If £2 denotes an observed value of £? thenits 
significance may be judged by computing 


Pr{B? > BR} = 3 rll k) h,_,(HN —),4(0—1)}, (2) 


* The symbol £? is suggested by the use of E? for this ratio in the unrestricted problem; the addition 
of the bar, as in x’, indicates that the means are ordered. 
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where I,( p,q) is the incomplete £-functiun and y = H2. A numerical example of the applica- 
tion of this test was given in I although, as we shall see below, the theory there was different. 

In the analysis of variance it is customary to calculate not HZ? but F, which is the ratio 
of the ‘between’ groups mean square to the ‘within’ groups mean square. For any given 
table, F and EH? are related as follows 


F = v, E?*/{v,(1— E)}, 


where v, and v, are the degrees of freedom of the numerator and denominator of F, respec- 
tively. The choice of F rather than Z? is purely a matter of convenience because the two 
tests are identical. This is not so with their ordered counterparts. Suppose that we use a 
statistic F' defined as the F-ratio for the reduced problem, then 


Pr{F > Fy} = Sp (1, k) Pr {Faw > Fy} 
1=2 
= ¥ rll) LAN -D, MI- 1), (3) 
where z= (1-1) F,/{(N —1l) + (1-1) BK}. (4) 


Both F and H? provide a valid test of the null hypothesis but they use different critical 
regions. This may be seen by comparing z with £3 expressed in terms of Fy, thus 


ER = (Iyp— 1) Fy/{(N — 1p) + (ly — 1) Fy}, (5) 


where |, is the observed value of 1. Equations (4) and (5) differ in that 1 is a variable of sum- 
mation in (4) and a constant in (5). The two results were regarded as equivalent in I (eq. (8)) 
and in II because of a failure to make this distinction.* 


2:3. Comparison with the F-test. The F-test, as described above, may be looked upon as 
an £?-test for which the rejection level depends upon 1. In III we showed how the power of 
ordered tests depended on the choice of critical region when o was known. The results 
suggested that the likelihood ratio test, for which the rejection level is constant, could not 
be substantially improved upon. We may therefore expect that H? will prove to be more 
powerful than F’. It is, in fact, possible to make large sample power comparisons using 
existing theory. In the limiting case, as N — oo, k fixed, the analysis of variance problem 
treated here and that for known o discussed previously become equivalent because 0° is 
estimated with infinitely many degrees of freedom. The results given in Table 1, for complete 
ordering of the y’s, were obtained using this equivalence. Some of these results have already 
been given in IIT but have been reproduced here for ease of comparison. The power has 
been tabulated as a function of 


a= |/ {> x (4; -7| /< (z = EM): 


For reasons given in III the power also depends upon the configuration of the y’s; two extreme 
cases have therefore been considered. It has been assumed that n, = n, = ... = m which 
makes the weights equal. It will be seen that although F is an improvement over F, it is 
much inferior to H?. The difference is so large that we may confidently expect the conclusion 
to apply when the sample sizes are finite. 


* I am indebted to Mr J. E. K. Smith of the Massachusetts Institute of Technology for first drawing 
my attention to this point. 
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Table 1. The asymptotic power functions of E?, F and F when p, > fg >... > fy, 





A 

€ te oe. 

0 1 2 3 + 
k=3 E? 0-050 0-244 0-605 0-892 0-987 
-050 +221 -569 *872 -983 
F 0-050 0-198 0-488 0-799 0-960 
-050 +196 *515 *835 -972 
F 0-050 0-130 0-402 0-776 0-959 
k=4 EB 0-050 0-239 0-594 0-885 0-980 
-050 -202 531 *849 -978 

F 0-050 — — — — 
-050 0-168 0-454 0-785 0-957 
F 0-050 0-115 0-350 0-710 0-945 


The upper figure of each pair corresponds to equal spacing of the y’s and the lower figure to the case 
when all but one of the y’s are equal. 


3. OTHER TESTS 


3:1. Tests based on scores. We have already pointed out that our problem has close links 
with regression theory. This correspondence can be exploited if the unknown j’s are replaced 
by scores satisfying the same order restrictions. This approach has been discussed by 
Armitage (1955), in an unpublished report by Abelson & Tukey and in ITI. If o were 
known in the present problem our test would be based on 


k k $ 
T=> cine | ( > cim) o, (6) 
i=1 


i=1 


where ¢,,C,,...,¢;, are appropriately chosen scores satisfying > c;n; = 0 and the order 

i=1 
restrictions. 7' is proportional to the regression coefficient of % on c; on the null hypothesis 
it is normally distributed with zero mean and unit variance. The natural modification when 
o is unknown is to replace it by the estimate from the within groups mean square of the 
analysis of variance table. The null hypothesis distribution of 7’ will then be that of t with 
X(n;—1) degrees of freedom. The power function of the test could be obtained from tables 
of the non-central ¢-distribution. The power comparisons with ¥* for equal weights and o 
known given in III (Table 9) may be looked upon as large sample results for the analysis of 
variance problem. It was shown in III that there was little difference between X? and the 
optimum scoring test if k is small and the ranking of the y’s complete. On the other hand, 
if the order restrictions were not complete, ¥? was much to be preferred. These conclusions 
apply asymptotically to #2 and 7’. The optimum method of choosing the scores ¢,, Cp, ..., Cx 
has been given by Abelson & Tukey for the case of equal weights; some of their results 
are given in ITI. 


3-2. Jonckheere’s test. When the y’s can be completely ranked under the alternative 


hypothesis (1, > “4, > ... > “,,) Jonckheere (1954) suggested the following distribution-free 
test criterion: 


k- 


1k 
S=2> > (Pis— 4m)» (7) 


i=1j=i+ 
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where m Ny 
Pi= XD Dd Piris 
r=1 s=1 
Fy on if 2, > Xe, 
an Pir,js = se te : é 
1 vip = Xs (a < j)- 


Jonckheere showed that S was the appropriate form of Kendall’s coefficient of rank correla- 
tion when one ranking contains ties. The sampling distribution of S, on the null hypothesis, 
is approximately normal with zero mean and variance 


1 k 
— [ween +3)— > n3(2n;+3)). 
18 i=1 


It is instructive to demonstrate the connexion between S and the scoring tests discussed 
above. Suppose that we put Pirin = (Cir — Xje) 


instead of scoring 0 or | according to the sign of the difference. Then we should have the 


criterion k-1 k 
ye .. @ ra > 1 
S=2y Y n,n,{(%;—%,)—4n,n,} 
i=1j=i+1 
k 
= 2) ¢,n;%;—a constant, 
i=1 
k i-1 
where ¢= > 2—>d>n, (s = 2,3,...,k—1), 
r=i+l1 r=1 
k k-1 
=D, Cye=— DMN, 
r=2 r=1 


k 
Since c, > c, >... > c, and } c;n; = 0, S’ is essentially the same as the numerator of (6) 
i=1 


and its significance would be judged in the way described for 7’. If n, = n. = ... = n,, the 
coefficients c; are in arithmetic progression and therefore the test is equivalent to the 
regression coefficient calculated on the assumption that the y’s are equally spaced. Because 
S takes account only of the sign of the differences between observations we should expect 
it to be less powerful than S’. On the other hand, it must be remembered that S does not 
require the assumption of normality. 


3-3. The power of Jonckheere’s test. The power function of S, like that of #2, depends 
upon k parameters but, by specifying the configuration of the ’s, only one need be con- 
sidered. By assuming that n, = n, =... = n;, = ” here, and throughout this section, we 
may express the power as a function of 


+f Su-mf 


which may be regarded as a measure of the ‘distance’ of the alternative from the null 
hypothesis. Let the mean and standard error of S for given 6 and n be y,,(S|d) and o,,(S|6), 
respectively. If dis fixed and n is allowed to increase without limit, it is clear that the power 
will tend to 1. However, to obtain comparisons with EH? we require the power as a function of 


a= Jem] Je) = 
[sam 
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is divided by the standard error of the means (%;) and not by the standard deviation of an 
individual observation. If A is to remain fixed as n > oo then é must tend to zero with n-}. 
If we assume that S is asymptotically normally distributed in the neighbourhood of é = 0, 
as well as at d = 0, then it may be shown (Noether, 1955) that as n > co 


K 
Power > Q (» A =a, , (8) 
where v, is the upper 100« % point of the unit normal distribution, 
Ot, (S|6) oyu 
K= 38 |p and Q(x) = Tom]. du. 





In order to determine the power we therefore require the expected value of S under the 
alternative hypothesis. Since the observations are normally distributed with mean yw; and 
standard deviation o, 


E (Dip, jg) = Pr fatip > 25g} = offen ae 


It then follows that E(pi;) = nrg (Ha oJ aed 
and therefore Lt, (86) = ons > (o(Aa=F — ' -3}. (9) 
% i=1j=it1 a/2} 2 


In order to calculate the asymptotic power we must first specify the configuration of ’s 
in which we are interested and then express (9) in terms of 6. The method will be made clear 
by considering the calculations for the two cases ae for comparison with 22. 
(a) p’s equally spaced. Let w;_,—; = A (t = 2,3,...,k) then w;—p,; = (t—j) A (7 > 4). 
k?-1 
att = ¥ (uP = PEE (uy)? = ar 


therefore eliminating A we have 


Substituting in (9) Lt, (S|6) = nS > {0( (§—j) 8/6 ) -3| 


im1j-it1\ WIk(-1)]) 2 
= = 
hence ns IG (k =~)- ni a (= =) : 
377 
2_ 
Now o2(8|0) = = + O(n?) 
so that, in the limit Power = Q(».- A J *) ; (10) 
(b) 4, > fg =... = &,. By a similar argument it may be shown that the asymptotic 
power in this case is given by 
gi: ah te 
Power = Q(v, ais kai)" (11) 


It is obvious that, unless k = 2, the test is more powerful in detecting a given A when the 
means are equally spaced than when all but one are equal. Further, in the latter case the 
power decreases as k increases. 








So: 
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It is interesting to compare (10) and (11) with the corresponding results for the S’ test 
which uses the actual differences between pairs of observations. Here we find the powers 


to be 
(a) Q,-A) and () Q(».-A Je). (12) 


There is, therefore, a reduction in power through using S instead of S’ but it is only a small 
one. S has the compensating advantage of being independent of the assumption of nor- 
mality. An alternative way of comparing S and S’ is to compute the asymptotic relative 
efficiency for the two tests; this is the ratio of sample sizes required to attain equal power 


Table 2. The asymptotic power functions of E*, S and F when pi, > fg > «.. > [hy 





A 
' eee A = 
0 1 2 3 4 

k=3 E? 0-050 0-244 0-605 0-892 0-987 
050 221 -569 872 -983 
S 0-050 0-252 0-622 0-901 0-988 
050 +212 -519 +814 959 
F 0-050 0-130 0-402 0-776 0-959 
k=4 E? 0-050 0-239 0-594 0-885 0-980 
-050 +202 +531 +849 -978 
Ss 0-050 0-252 0-622 0-901 0-988 
-050 +187 -448 734 -917 
F 0-050 0-115 0-350 0-710 0-945 

k=8 E? 0-050 —s = = — 
-050 0-191 0-456 0-800 0-973 
S 0-050 0-252 0-622 0-901 0-988 
-060 -140 +303 -519 -730 
F 0-050 0-090 0-249 0-535 0-853 

k=12 E? 0-050 hes ee =— re 
050 0-178 0-423 0-766 0-963 
S 0-050 0-252 0-622 0-901 0-988 
-050 -120 -240 -406 +592 
F 0-050 0-080 0-205 0-466 0-776 


The upper figure of each pair corresponds to equal spacing to the y’s and the lower figure to the 
case when all but one of the ’s are equal. 


for the same alternative (Noether, 1955). For both configurations which we have considered 
A.R.E. = 3/7 = 0-95 and this may be shown to hold for all configurations. This is in fact a 
generalization of a well-known result for k = 2 (see, for example, Mood, 1954). In this case 
S is the Wilcoxon or Mann-Whitney statistic and S’ is asymptotically the same as the two- 
sample t-test. For k = 2 it is known that S’ is the most powerful test but this is not true for 
k>2. Abelson & Tukey have shown that the power given by (12) can be increased for 
k > 3 by choosing unequally spaced scores. 

Asymptotic relative efficiency cannot be used to compare S (or S’) with H? because the 
limiting distribution of the latter is not even approximately normal unless k is very large. 
Some numerical results for #2, S and F are given in Table 2. Some of the figures for H? and 
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F have already been given in Table 1, but they are reproduced here for ease of comparison. 
For k = 3and 4 there is little to choose between EH? and S; the latter is slightly more powerful 
than £2 when the y’s are equally spaced and slightly less powerful at the other extreme when 
all but one of the y’s are equal. In view of the distribution-free character of S it should, 
perhaps, be preferred to #? under these circumstances. The position is altered for larger k. 
Although S will maintain its superiority for equal spacing it is very much inferior at the 
other extreme. In fact, in this case, when k = 8 and 12, S is worse than F when A > 2 and 
very much worse than £?. The upper values of H? for k = 8 and 12 are not known but are 
unlikely to exceed the upper values for S. 


4. CONCLUSIONS 


In this paper we have derived the £?-test for ordered alternatives in the analysis of 
variance and reviewed a number of existing tests. In an attempt to select the best test we 
have made asymptotic power comparisons. Use has been made of the fact that, in the limit, 
some of the analysis of variance tests are equivalent to tests considered in III. Thus £? 
and xX”, F and x? have the same limiting power. New results have been obtained for S. The 
main conclusions on relative power are set out below. It must be borne in mind that they 
are asymptotic results only and that they involve the assumption of normality. 

(a) E? is always to be preferred to F as defined in I. 

(6) If the ~’s can be completely ranked under the alternative there is little to choose 
between H? and S if k is small (3 or 4), but if k is large (say > 4) S is liable to be worse than 
F and therefore H? should be used. 

To these we may add a conclusion from III which will be true asymptotically here. 

(c) If there is partial ordering of the y’s, H? is better than tests using Abelson and Tukey’s 
optimum scores (S is not applicable). 
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Quota fulfilment using unrestricted random sampling 


By D. H. YOUNG* 
University College London 


1. INTRODUCTION 


The problem to be considered arises in the following way. A sample is required from a 
population, divided into & strata w,, we, ...,w,, such that m; individuals are obtained from 
w; (t= 1,2,...,k). m, will be termed the specified quota for stratum w;. Suppose that 
either due to cost considerations or sampling restrictions it is necessary to sample randomly 
from the population without regard to stratification. A sequential scheme may then be used 
in which individuals are randomly and unrestrictedly drawn from the population until all 
the specified quotas have been obtained. When sampling finishes, the exact quota will have 
been drawn from one of the strata, the remaining k— 1 strata having yielded at least their 


specified quotas. The number of individuals n (x > > m,) which are sequentially drawn in 
i=1 

order to obtain m,, mg, ...,m,, ine*viduals from wy,, wy, ...,w,, respectively, will be called 

the least sample number for quota fulfilment. Finally, it is assumed that an effectively 

infinite population, with a proportion p; of individuals belonging to w,, is to be sampled 

Xp; = 1). A general extension of the theory to the finite case would present many additional 

problems and is beyond the scope of the present paper. 

It is of interest to note that for the special case m; = 1, p; = k (i = 1, 2,...,k) the 
sequential procedure for quota fulfilment is analogous to the classical occupancy problem 
where balls are randomly and independently thrown into k identical compartments until 
every compartment is occupied. This case was discussed in a recent paper by Barton & 
David (1959). The sequential sampling scheme for the binomial case k = 2, with m, = m 
say and m, = 0, has also been considered by several writers (e.g. Haldane (1945) and Finney 
(1949)) as an alternative to the standard method of estimating the frequency of an attribute 
from a sample of fixed size. 

In this paper some of the basic statistical properties of the random variable n will be 
considered for more general cases. The results will, for example, be of use in demonstrating 
the effect of variation in the m; and p,, for given k, on: 

(a) The moments and, in particular, the expectation of the total number of individuals 
in excess of the sum of specified quotas (M = =m,) when sampling finishes. This would be 
useful for comparing the expected cost of the unrestricted sampling procedure with the 
cost of sampling restricted to the individual strata, assuming known (different) cost functions 
for the two procedures. 

(6) The probability that there will be need for additional sampling, after a specified number 
of individuals has been drawn from the population. 

Results of this kind could be applied to give more flexible principles for test procedures 
of the type described by Johnson (1957), where an ‘optimal’ random sample is drawn without 
regard to stratification and any deficiencies then made up by sampling restricted to each 
deficient stratum. 


* Now at the Unilever Research Laboratory, Sharnbrook, Bedford. 
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The distribution function and moments of n are not in general capable of concise expres- 
sion. General formulae will be derived first and then some special cases considered in detail. 
Several approximations to the upper percentage points of the distribution of n are developed, 
one being based on the distribution of the extreme positive deviate from the sample mean 
in a sample of independent unit normal variables. The relative accuracy of the approxima- 
tions are investigated empirically and a useful upper bound for the maximum error in 
significance level is given for one of them. 


2. DISTRIBUTION FUNCTIONS OF THE LEAST SAMPLE NUMBER FOR QUOTA FULFILMENT 


The distribution function of n may be obtained as follows. In order that sampling finishes 
after a total of n unrestricted drawings from the population, the specific quotas from k—1 
of the strata, and exactly m;—1 individuals from the remaining stratum, w; say, must be 
obtained in the first n — 1 drawings, the nth drawing yielding a further individual from the 
deficient stratum w,. Since the joint distribution of the stratum frequencies in a random 
sample of size n — 1, is multinomial of order k with parameters 7), 7», ..., P;,, the probability 








of n is 
k(n oreo DP; )’ 
n) = 1—p,)"-™z...2 “ 
p(n) Pa fone 2 (1—Pi) i 8;! 1—p; 
k (n-1)! | pji 
= — prs...&o TT, 1 
2 (m =i" Ns)! (1) 


where the inner multiple summation is over all integer values of s; (j = 1, 2,...,k; j + 4) 
subject to the conditions Xs; = n—m, and s; > m; for all j. This form of expression is quite 
convenient for actual computation of p(n), at least for relatively smail values of k and m;,. 
An alternative form, less concise than (1), but more suitable for studying the moments of n, is 


~ ad mf} — n—m 

p(n) = - ll 1) (1—p,;) 

) (t) Dis 
ey RTT Z| 


u=1 8; 


«(1+ (-1E 


HCY) {8ju<mju} 
where S,= yr 8, B= > Pius 
u=1 u=1 


and where > denotes the summation over all possible integer sets of (j1, 72, ..., jt) subject 
to the a l<ju<k (w=1,2,...,t), 
ju+i forall u, (3) 
jutju if uta, 
and > represents the multiple sum over integer values 38,,,8;9,...,8% subject to 
Sin < eth I (w = 1,2,...,¢). Equation (2) is valid for integer values of k > 2, provided 


0 
2 is interpreted as zero.’ For the symmetric case m; = m, p; = k~ (t = 1,2,...,k), (2) 


simplifies to 


_ fal ss (n—1)!, -1\ @ (k—t —1)"-™-S: 
vin) = (A) pete e(", ) m 











acm) 8! «+ &! (n—m—S,)!" 
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Ifin (4) we put m = 1 we obtain the well-known results for the classical sequential occupancy 


problem aie Ne ee a ee ae 
oC) EC ea! AG) --} © 


or, in terms of differences of zero, 





Ak-197-1 
p(n) alae ket * (6) 


The probability distribution of n also simplifies a great deal for the case m = 2 and may 
be written 


k-2 es t : : 
p(n) = . a 1}( ‘ ‘) p (; (n— 1)#+1 (k-t— 1)"-2-#, (7) 


3. MOMENTS OF n 
The ascending factorial moments of n may be derived directly from (2) but, as we would 
expect, they are not capable of concise expression for general m; and p; (i = 1,2,...,k). 


We will derive the general formula and then proceed to discuss special cases. 
From (2), the rth ascending factorial moment of n is given by 


E(n(n+1)...(n+r—1)) 


a. ee = Bi yn—m 
GT a (n—m,)t bP) 








® © +t 7m « _ 
+ > an 2 x(- v2. Pu ys (n+r—1)! (1—p,—B)-™-, 


i= er G(%) {8ju<mju} U= 1944! n= uu (n— m; —8,)! 


Applying the result for the summation with respect to the index of a binomial term 


o (N 
p> ((,) 2 —py* = pe I,_,(K —o,o+1), 


where in the usual incomplete beta function notation 


a—-1/N 
I,(a,N -—a+1)=1- > (;) 2a —p)N-», 


w=0 


vn i =n prt, _p(M —m,,m, +r) 


© & (m,+8,+r—1)! 


gives &(n(n+1).. (ntr—1))=3 





k- 
+ ee, Wai 
z ae - mie z a a} {8ju<mju} — 
Pie 
x T,_p,-p(M —m;—8,m, +r +8) nn 2 (8) 
u= Sy! 


Some specific cases will now be considered for which (8) reduces to fairly concise forms. 
(a) k = 2, general m; and p; (i = 1, 2) 
For the binomial case, the rth ascending factorial moment of n is 


E(n(n +1)... (m+r—1)) = (my +r—1) pz" Ly, (ms, my +1) + (my +7 — 1) pz" T,,(m, my +7). 


(9) 
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For large m, and m, standard normal approximations may be applied to the incomplete 
beta functions, giving 


p _rgy (M1P2—MePitTP2t+Pi- 4% 
E(n(n+1)...(n+r—1)) = (my t+r—1)p ro [TPs eS 
' ’ (m,+m,+r—1)3(p,p5)4 


e MP1 — M1 Py+TP, + P2o—4 
+(my+r—1)p ro(m 5 Shes 32 \ 
; ‘ (m,+m~>r— 1) (py p2)* 


where O{x} = (2n)-4f” eh" dt. 


If accurate estimates or even the exact values of the stratum frequencies p,, p, are 
available before sampling starts, the specified quotas m,, m, may be chosen proportional 
tO P1, Po, respectively. In this case 

| r 2r—1 
E(n(n+1)...(n+r—1)) = (my t+r— 1yo(™* a) o| Ba ii sete 
™, 2(mmz)* (my +m, +7r— 1) 


“ (my L-r— 1yo9(™ Ea)’ 


Ms 





mm (2r-1)+m, 
Res ese (10) 


If the ® functions are expanded in exponential series and terms of order mz} or mz? neg- 
lected, further approximations to (10) can be obtained. In particular the mean value of n 
is approximately 


ae m,+m,\3 
E(n) + my +m, + (5m) (11) 
and the variance of n for the symmetric case m, = m, = m and p, = pz = 3, is 
var (n) + 2m(1 — 27-1) + 2mia-4 4+ 2, (12) 


Exact values of the expectation and variance of n have been calculated for the symmetric 
case, for some typical values of m, and are shown in columns (1) and (3) respectively in 
Table 1. Columns (2) and (4) give the corresponding approximate values obtained from (11), 
putting m, = m, = m, and (12), respectively. Table 2 gives values of &(n) for the case when 
My, My are proportional to p,, p, respectively for p, = 0-5(0-1)0-9and M = m,+m, = 20, 50, 
100. The figures shown in column (1) are the exact values obtained from (9) putting r = 1, 
while column (2) gives approximate values obtained from (11). 


Table 1. Values of &(n) and var (n) for the symmetric case p, = p. = 4, My = M, = ™M 


(1) @(n): exact; (2) &(n): approximation (11); (3) var (m): exact; (4) var (n): approximation (12). 


m (1) (2) (3) (4) 

5 12-461 12-523 6-405 6-475 
10 23-524 23-568 11-106 11-154 
15 34-334 34-370 15-551 15-590 
20 45-015 45-046 19-866 19-900 
30 66-155 66-180 28-275 28-302 
40 87-114 87-137 36-502 36-525 
50 107-959 107-979 44-615 44-634 


The approximations in Tables 1 and 2 tend to overestimate the true values of &(n) and 
var (n), but the errors become practically negligible with increasing m, and mg. 
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Table 2. Values of &(n) when m,, m, are proportional to p,, pz, respectively 
(1) &(n): exact; (2) &(n): approximation (11). 











m,+m, = 20 m,+m, = 50 m,+m, = 100 

ct - i il A 7 yi A. is | 
Pr Pe (1) (2) (1) (2) (1) (2) 
0-5 0-5 23-524 23-568 55-614 55-642 107-640 107-979 
0-6 0-4 23-594 23-642 55-728 55-758 108-132 108-143 
0-7 0-3 23-833 23-893 56-117 56-256 108-688 108-706 
0-8 0-2 24-364 24-460 56-991 57-052 109-948 109-974 
0-9 0-1 25-703 25-947 59-246 59-403 113-239 113-298 


(b) k= 3,m, = mand p,; = } (t = 1, 2,3) 


For the trinomial symmetric case, the rth ascending factorial moment of n may be written 


E(n(n+1)...(n+r—1)) = sra{(m+r— 1) I,(2m,m +1) 


m1 (m+s+r—1)¢+9 
” 7? 1 gmeerr—1 — 4y(2m —8,m+8-+7) P (13) 


In particular for r = 1, the mean value of n after some algebraic manipulation is 


E(n) = 9m > Pe, ) (6 I,(m,m +8). (14) 


Table 3. Values of &(n) for the symmetric trinomial case 


E(n) E(n) 

m Exact (15) Approx. (16) 
1 5-50 5:95 
2 9-64 10-00 
3 13-49 13-81 
4 17-19 17-49 
5 20-81 21-09 
10 38-20 38-44 
20 71-56 71-77 


An alternative expression, derived by Young (1960), which contains a fixed (small) 
number of terms, for all m, and is therefore more useful for cc my utational purposes is 


(3m)! 3-3" 


(m—1)!m! (m+1)!" _— 


2 

E(n) = 3m+ om("*") 2-2 T,(m + 2, 2m + 1) + (9m + 5) 

The following approximation to (15), which neglects terms of order less than or equal 

to m-}, is obtained by simple application of Stirling’s formula to the factorials and the 
standard Normal approximation to the incomplete beta function 


E(n) = 3m+= /—+——. (16) 


Table 3, which gives values of &(n), for some small values of m, indicates the increasing 


accuracy of (16) with increasing m. 
22 Biom. 48 
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(c) nm, = 2, Pi = k (7 = 1, 2,...,k) 
In this case (8) simplifies to 


er k-1) £ (t\(r+j+))! Batts 
Elon)... (mser—1)) = WS (—I("Y") (5) al anh 2— fir 5-+2), 
(17) 


Values &(n) have been calculated from (17), putting r = 1, for k = 2(1)5 and are shown in 
column (1) in Table 4. In column (2), corresponding values of &(n) for the case m; = 1, 
p; = k(t = 1, 2,..., &) are given calculated from the well-known formula 


E(n) =kE j. (18) 
j=1 


The figures in column (3) are the ratios of expected total ‘excess’ of individuals for m = 1 
to the expected total ‘excess’ for m = 2. These ratios appear to be fairly stable, with respect 
to variation in k. 

Table 4 


(1) &(n) when m = 2, p; = k-1; (2) &(n) when m,; = 1, p; =k (t= 1, 2, ..., k); 
(3) (&(n|m = 1)—k)/(E(n|m = 2) — 2k). 


k (1) (2) (3) 

2 5-500 3-000 0-667 
3 9-639 5-509 0-687 
4 14-186 8-333 0-701 
5 19-041 11-417 0-709 


4, UPPER SIGNIFICANCE LIMITS FOR n 


Finally, to complete this paper, the problem of determining the upper percentage points 
of the distribution of n will be considered. An approximation, which is sufficiently accurate 
to determine the correct integer-valued percentage points in nearly all cases, whatever the 
values of m,,p; (i = 1, 2,...,&) and k, is obtained as follows. Consider the stage in the sequen- 
tial sampling procedure when n* individuals (n* > M = Xm,) have been randomly drawn 
from the population. If n; of the n* individuals belong to w;, then 


Pr {n > n*} = Pr| 3 Bl, (19) 


where H; denotes the event n; < m,, i.e. 
k 
i=1 i<j i<j<l 


If this series is terminated, the remainder is less in absolute magnitude than the last term 
omitted and of the same sign. For the present purpose, it is convenient to use the inequality 


k 
i=1 i<j 


me met mn! pi py (1 —py— pj) 


n* ti n*—ti * 
x2 (‘; ) pta-p0) —Prin> n*+1}< 2d > 





(21) 


i<j t=0 4=0 t,!t;! (n* —t;—t,)! 
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(21) may be replaced by the weaker inequality 


k m-1 


es m-1 n* mj,—-1 n* “ 
BS ("7 ) ptt —por*—Pr{n > nt 41} < BES, ) ota —po "5 (7) py —pyr 
i<j t=-0 \& =-0 \b; 


i=1 i= 


Hence 3"S, (",) rttt—por4—Pr{n > wt +1} <3(5 "5 (" ")ota—por-4l". (22 


i=1t= i=1 t= 
For n*, sufficiently large, the bound on the right-hand side of (22) will be negligible and 
Pr{n > nt4 te 3S ()pt—po"*= Fh plnt—me+ 1m). (23) 
i=14=0 


Defining the nominal upper 100« % point n,, of the distribution of n, as the least integer 
values satisfying the inequality Pr{n > n,} <a 


and applying (23), approxima‘e solutions for the n, are obtained as the least integer values 
for which 


k 
DL i(Ma —M;,M,) < a. (24) 
i= 


From (22) it follows that the approximate significance level, as computed by the left-hand 
side of (24), differs from the exact significance level by less than 


1(* _ 2 
2 (> 1—p(Ma —™, mo) 


which, from (24), cannot exceed 4a. Clearly for a of the order 0-05, 0-01, this error in signi- 
ficance level will be negligible for most practical purposes. 
Developing the argument further, (22) may be applied in the form 


k 1(& 2 
Pr {n 2 Ny + 1} > 2D Ai-pi(a —M,+ 1, m;) ~~ 9 2D A-plMa —™m; + 1, mo) . 
= i= 


The least integer values n,, as given by (24), must then be the correct nominal percentage 
points if k 2 
DY L-p(%.—m;, + 1,m,; d-3{%, D L-p(% —™+ 1,m)| >a 
i=1 


or, neglecting terms of order less than or equal to «°, if 
k ; 

DY Lp —™; + 1,m,) > & + 30? (25) 
i=1 


For m; small, values of n, may be found directly from (24), using tables of the incomplete 
beta function (ed. K. Pearson (1934)). If, however, the m, are large and the p,’s not too small 
the incomplete beta functions may be approximated by Normal probability integrals: 
the n, are then the least integer values satisfying 


En | Mi —F—(M.— 1) D; | 

» of ak Mth < a. (26) 
i=1 \((n,—1) pl = pont 

If the inequality in (26) is replaced by an equality and terms of order m-? neglected then, 

for the symmetric case, 


Ny = = km+ {k(k Ti 1) nyt ale + $h(Aan alt 1) ‘> (405i. ae 1), (27) 
where A, is the upper 100a/k percentage point of the standard Normal distribution. 


22-2 
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For the symmetric case, an alternative approximation to (27) may be used. From (19) 


Pr{n < n*} = Pr{n, > m4, nq > Mg, ..., Ny, > Mz}, (28) 


n,; being the number of individuals drawn from w;, in a sample of n* unrestricted drawings 
from the population. 


If m, = m, =... = m, and p, = p. = ... = p;, then 
Pr{n < n*} = Pr{min;n,; > m}. (29) 


Johnson & Young (1960) showed that the joint distribution of the standardized deviates, 
(kn; —N) (N(k—1))-? of the symmetrical multinomial distribution, with index N and equal 
parameters k-!, may be approximated by that of the deviates from the sample mean 


(4) @-=. (4) @-=. vee (4) >, 


where 2,, 2, ...,%, are k independent unit normal variables. 

In particular the minimum multinomial frequency is approximately distributed as 
Nk-\(1—k?N-tu,), where u, is the extreme positive deviate from the sample mean in a 
sample of k independent unit normal variables. Applying this in (29) gives 

— n* —km+ 3k 
The factor $k in (30) arises from the use of a continuity correction 4 which is simply half 
the interval between the consecutive values of the distribution of n. From (30), it follows 
that the 100a % point n, of the distribution of n is approximately given by 
n,—1 —km+ tk + ngs (31) 
(k(n, —1))* 
where wu, ;, is the upper 100a % point of the distribution of w,. 

Table 25 of Pearson & Hartley (1958) gives values of u,, obtained from the probability 
integral tabulated by Nair (1948). Finally, if terms of order less than or equal to m-} are 
neglected then the solution to (31) is 


n, = km—}(k—2)+ hkuz ,+ku,,,./m. (32) 


In a preliminary investigation, the least integer values n, satisfying (24) have been 
calculated for the symmetric case for k = 2,3,4, « = 0-05, 0-01 and m = 1(1)6. These are 
shown in columns (a) of Table 5. The figures given in brackets are the corresponding approxi- 
mate significance levels as computed by the left-hand side of (24). These differ from the 
exact significance levels by less than }«?. Columns (5) and (c) show approximate valuos of 
n, as calculated from (27) and (32), respectively. These values, although given formally to 
two decimal places, would, in fact, give practical procedures identical with those given by 
the next larger integer values. 

For k = 2, the values of n, as given in columns (a), (b) and (c) would, in all cases, give 
identical practical procedures. For k > 2, approximations (27) and (32) give too large a 
value of n, and hence too small a significance level. However, there are clear indications, 
especially for k = 3, that this is due to the relatively small values of m that are considered 
in Table 5. Table 6, which has been constructed using (27), gives approximate values of n, 
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Table 5. Upper percentage points of the distribution of n when 
™m; = Mm, P; = ke (a = 1, 2, a 


(a) Inequality (24), (6) approximation (27), (c) approximation (32). 








a= 0-05 a=0-01 
c ame =~ c A— ‘ 
k (2) (b) (c) (a) (6) (c) 
m=1 2 7 (0-0312) 6-69 6-69 9 (0-0078) 8-96 8-96 
3 12 (0-0347) 12-24 12-26 16 (0-0069) 16-51 16-55 
4 17 (0-0401) 18-30 18-29 22 (0-0095) 24-54 24-53 
m= 2 2 10 (0-0391) 9-84 9-84 13 (0-0063) 12-47 12-47 
3 17 (0-0411) 17-40 17-42 21 (0-0099) 22-26 22-31 
4 24 (0-0464) 25-52 25-50 31 (0-0079) 32-57 32-56 
m=3 2 13 (0-0386) 12-72 12-72 16 (0-0074) 15-63 15-63 
3 22 (0-0385) 22-06 22-08 27 (0-0075) 27:37 27-43 
4 31 (0-0424) 31-98 31-97 38 (0-0083) 39-66 39-65 
m=4 2 16 (0-0352) 15-47 15-47 19 (0-0075) 18-60 18-60 
3 26 (0-0451) 26-45 26-48 32 (0-0073) 32-15 32-21 
4 37 (0-0420) 38-06 38-05 44 (0-0097) 46-27 46-25 
m= 5 2 18 (0-0490) 17-49 17-49 22 (0-0072) 21-46 21-45 
3 31 (0-0486) 30-68 30-71 36 (0-0090) 36°72 36-78 
4 43 (0-0434) 43-89 43-88 51 (0-0084) 52-56 52-54 
m= 6 2 21 (0-0414) 20-71 20-71 25 (0-0066) 24-24 24-24 
3 35 (0-0391) 34-80 34-82 41 (0-0077) 41-14 41-20 
4 48 (0-0497) 49-55 49-53 


Table 6. Approximate upper significance limits for n when m; = m, p; = k™ (i = 1,2, ...,k) 





























m= 10 m= 15 m = 20 
ct A = tc A— ‘ t - — 
k 0-05 0-01 0-05 0-01 0-05 0-01 
2 30-69 34-84 42-66 47-43 54-32 59-61 
3 50-51 57°88 69-22 77-60 87-34 96-58 
4 71-09 81-57 96-61 108-48 121-26 134-30 
5 92-22 105-77 124-62 139-92 155-85 172-63 
6 113-79 130-88 153-11 171-81 190-97 211-44 
m = 25 m= 30 m = 35 
k 0-05 0-01 0:05 0-01 0-05 0-01 
2 65-78 71-53 77-10 83-27 88-32 98-87 
3 105-09 115-09 122-58 133-26 139-87 151-18 
4 145-06 159-44 169-06 184-08 192-47 208-35 
5 186-34 204-43 216-31 235-57 245-87 266-22 
6 227-89 249-93 264-15 287-60 299-90 324-65 
m = 40 m = 45 m = 50 
a — ‘v ¥ — i rr “ =, 
k 0-05 0-01 0-05 0-01 0°05 0-01 
2 99-45 106-36 110-52 117-75 121-52 129-08 
3 157-00 168-89 173-99 186-43 190-89 203-85 
4 215-64 232-32 238-62 256-05 261-44 279-58 
5 275-12 296-48 304-11 326-41 332-89 356-09 
6 335-26 361-22 370-29 397-39 405-05 433-22 
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for m = 10(5) 50, k = 2(1)6 and a = 0-05, 0-01. For k = 2, 3, (27) almost certainly deter- 
mines the correct integer-valued nominal percentage points. For the larger values of k, 
and m small, the significance limits may be in error by one or two at the most. 
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Large-sample estimation of parameters for moving- 
average models 


By A. M. WALKER 
Statistical Laboratory, University of Cambridge 


1. INTRODUCTION 


Let {x}, = 0, +1, + 2, ..., be a stationary normal moving-average process, defined by 

Hy = M+G+ Pye y+... + Pea, (1) 
where (¢,) is a set of independent random variables, each distributed normally with mean 
zero and variance o*. The problem of making inferences about the parameters f,, f>, ..., 8), 
given a sample of consecutive observations (x, 2», ...,2,,) from the process, is a well-known 
one in time-series analysis. Little progress with this seems to have been made except under 
the assumption that n is large, but in the large-sample case the work of Whittle (1951, 
Chapter 7, 1953, 1954, pp. 211-18) enables one to obtain, at least in principle, a solution 
which for most purposes can be regarded as complete. From this work it follows that, 
provided that the roots of the equation z’ + £,z’-1+...+), = 0 all have moduli less than 
unity, the logarithm of the likelihood of the sample is given asymptotically by 


ae. ee y 
L 5 log (270°) al 





Elaine! {f(0)}*do, (2) 


where f(w) = (1+ hye +... +f, ec) (1+ fe +... +f, e-™) 07/20 
is the spectral density of the process {z,}. Also the second term in (2) may be replaced by 





n $n) 


where for s > 0, C, = = (t- ?) (%145—/)/(n—8) is the sample serial covariance for lags, 
and C_, = C,, a, is the sactlliienk of e* in the Fourier expansion of 
{g(w)} > = o{2nf(w)}* = (14+ fet... + hye) (1+ few +... + hye), (3) 


and $(n) > co, n-14(n) > 0 as n > oo. Asymptotically efficient estimates /,, /,,..., 2, are 
therefore obtained by minimizing U = n La, C, with respect to /;, fo, ...,8,. (If w is not 


specified a priori it is easily seen that we call merely replace 4“ by ¥ = = 2,/n in the defini- 


tion of C,.) These estimates are asymptotically normal, the limiting joint distribution of 
Jn($;—8;) (¢ = 1,2,...,2) being multinormal with mean O and covariance matrix the 
inverse of the matrix whose (7, j)th element is the constant term in the Fourier expansion of 


lologg(w)dlogg(w) ..._ |, 

3 OB, of, I odd. 
Tests of hypotheses that certain of the £; have specified values may also be made by means 
of the likelihood ratio principle, and the standard asymptotic theory for such tests is 
applicable. 
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The condition on the roots of z” + £,z"-!+... +, is incidentally not very restrictive since 
any stationary normal moving-average process has a ‘regular’ representation of the form 
(1) such that the moduli of the roots of z" + £,z"-1+...+,, = 0 do not exceed unity (see, 
for example, Wold, 1954, p. 126). The only cases not covered are therefore those for which 
there is at least one root of modulus unity, arising when the spectral density of the process 
has at least one zero. 

There are, however, various difficulties which may occur in using Whittle’s method in 
practice. It might be thought that the obvious procedure for obtaining /,, /s, ..., 8), by 
solving the equations 0U/0f; = 0 (i = 1, 2,...,h) by successive approximation, taking as a 
first approximation the estimates obtained by equating the sample serial correlations 
r, =C,/C, (i = 1,2,...,h) to the corresponding autocorrelations p; of the process, would 
usually be satisfactory. However, the asymptotic efficiency of the first approximation will 
not be high unless /,, £3, ..., 8, are small, so that the approximation process may converge 
slowly or perhaps even not at all; moreover, there may be an appreciable probability of the 
first approximation values not being real—this will happen if the polynomial P(w) defined 


h 
by P(z+27) = ® 1,(z*+27*) has a zero with modulus less than 2 which is not of even multi- 
s=--h 


plicity (Wold, 1954, p. 154). Also the expression of the «, as explicit functions of /,, £., ..., 2), 

obtained by putting {g(w)}-' into partial fractions which are then expanded, will in general 

involve very tedious algebra. (The problem is the same as that of expressing the auto- 

covariances of an autoregressive process in terms of its coefficients, which is considered by 

Quenouille (1947, pp. 365-6).) The results are quite complicated even when h = 2, for which 
A,  #{A,+A,z) | 2-(A,+A,277) 


{o(w)}" = — 3+ 14 Bot Byak* 14 fie + fae 





(where z = e*), whence 








eo Ay ne — B(1+ fo) 
: (1—f,){(1+f2)?—AR? ~~? (1— fe) {(1 + Ae)? AB’ 
so that a =—A,/P,, «=a, =A,, & = a.=A,—f, Aj, ete. 


The equations 0U/0f; = 0 will therefore be very cumbersome. Direct evaluation of the 
constant terms in the expansions of 

1dlog g(w) 0 log g(w) 

2 0B, of, ’ 
which may be regarded as elements of the information matrix (per observation), is also 
troublesome, but can be avoided by using the fact that this information matrix is equal to 
the covariance matrix of h consecutive observations from a stationary autoregressive 


rocess such that 
P {ys Wwt+Piyrt---+Prayn = bp 


where {£,} is a set of independent random variables with mean zero and variance unity. For 
the elements of the inverse of the latter covariance matrix (which is therefore equal to the 
asymptotic covariance matrix of ,/n(f;—£;) (i = 1,2,...,h) are readily obtained in terms 
of £,, 2, ..., 2, (see Durbin, 1959, p. 311, equation (19)). 

With an electronic computer, one might determine the values /, for which U is a minimum 
by evaluating U for a sufficiently large number of sets of values of the /;, but even then 
the computations could be fairly laborious. 
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There is also what may be called the ‘truncation’ problem, the choice of the maximum 
lag @(n) of the sample serial correlations occurring in the sum which gives U. Although the 
precise form of $(n) does not affect the limiting distribution of the estimates /,, it will have 
a considerable influence on the accuracy of this limiting distribution as an approximation 
to the actual distribution for a given finite value of n. Presumably the more rapidly ¢(n) 
increases with n, the more rapidly will the variances and covariances of the estimates 
approach their asymptotic values, but the more slowly will their biases decrease in magni- 
tude, and their distribution tend to normality, so that one would have to strike a balance 
between these effects in seeking a ¢(n) which was in some sense optimum. Determination 
of such an optimum ¢(n), which in any case would depend on the true values of the para- 
meters /;, seems to be a formidable problem. In practice one might proceed by taking ¢(n) 
to be such that |a*/a¢'| is less than some prescribed small quantity, say 0-01, for alls > (n), 
where «# denotes the value of a, obtained by replacing the /; by the first approximation 
to the #; (x, > 0 exponentially as s > 00). Alternatively one might determine a sequence of 
estimates # by minimizing U, = n > a,C, for values of k forming an increasing sequence 

s=—k 
{k,}, and take $(n) to be the least k, such that the differences |/{*?) — #{*r-»| are negligible. 
However, it is by no means clear that these two methods will always be satisfactory, and 
the second may require a very large amount of computation. 

In this paper we present a modification of Whittle’s method which enables these diffi- 
culties to be avoided to a large extent, and also usually requires much less computation. 
The idea on which this is based is extremely simple; it is merely a question of observing that 


since U = nQ, . a,7,, estimates obtained by maximizing the likelihood of the first (”) 
8=—¢(n) 

sample stiniliaalans are equivalent to the asymptotically efficient estimates /,, and similarly 
that those obtained by maximizing the likelihood of the first k sample serial correlations 
are equivalent to the #, for any finite value of k. An exact expression for the likelihood 
of a set of sample serial correlations is not known but the approximation derived from their 
asymptotic distribution (which is multinormal) can be used. The asymptotic behaviour of 
the resulting estimates based on 7,,75, ...,7;, for any finite k is easily determined, and con- 
sideration of their asymptotic efficiencies will often enable one to deal satisfactorily with 
the truncation problem (see § 3). 

An alternative method which enables asymptotically efficient estimates to be obtained 
with even less computation has been given by Durbin (1959). Here the moving-average 
process is approximated by an autoregressive process of order k, and the likelihood of the 
asymptotic distribution of the least-squares estimates of the coefficients of the autoregres- 
sive process (which are asymptotically equal to functions of r,, 72, ...,7,) is then maximized. 
This method has the advantage of giving estimating equations which are linear in the un- 
known parameters, and can therefore be solved without iteration. However, there is the 
problem of determining the value of k (analogous to the truncation problem mentioned 
above), which Durbin does not discuss in detail, and this might sometimes make the 
application of the method difficult. 

Another feature of the present method is that adjustments to the estimates to allow for 
their biases can be made fairly simply, as shown in § 4. With Durbin’s method, on the other 
hand, the derivation of similar adjustments seems to be very tedious (the case of a first- 
order process is considered in the Appendix). 
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2. LARGE-SAMPLE ESTIMATES BASED ON A FINITE NUMBER OF SERIAL CORRELATIONS 

Let k be a fixed positive integer, and L,, the iogarithm of the likelihood of the asymptotic 
distribution of r,, 75, ...,7,, Which is to be maximized. L, is most easily expressed as a func- 
tion of the first A autocorrelations /p,, p,,...,p;, of the moving-average process, and we 
therefore take these as the basic parameters rather than /,, f2, ..., £,. The f; are single- 
valued functions of the p;, determined by 


h h 
1+ > £277 = T[ (1-227), (4) 
i=1 i=1 


h 
where 2,, 2», ..., 2, are the roots of the equation 1+ > p;,(z'+z-*) = 0 whose moduli are less 
i=1 


than unity. This is an immediate consequence of the identity 
h { h : h h 
1+ Spieler) a(14 At) (14+ ¥ Ax) [(1+ vA). 
i=1 i=1 i=1 i=1 


Thus we shall obtain estimates A of p;, and take the estimates / to be the same functions 
h 

of the p. The £ of course cannot be determined if the equation 1+ > f(z‘ +z-*) = Ohas 
i=1 


roots of the form z = e? (where 0 < ¢ < 7), but the probability of this happening will 
usually be small. (When it does happen we may take the corresponding estimates 2" to 
be substituted for z; in (4) as (e*?, e-*?) repeated m times for a root of even multiplicity 2m, 
and +1 or —1 repeated 2m+1 times for a root of odd multiplicity 2m+1, according as 
¢ is nearer to 0 or 7, except that if 6 = 0 or 7, when the root must be of multiplicity 2m, 
we repeat 1 or — 1 m times.) 

Now as n + 00, the joint distribution of ./n(r;—p;) (¢ = 1,2,...,4) tends to the multi- 
normal form with mean O and covariance matrix W = (w;;), where 


Wig = LD {PoP vrij + PePv+virg + 2PiP5Po — 2PiPvPv+5 — 2Pj PvP vi}; (5) 
v=—a 


p, denoting the correlation between x, and 2,,,, for any integer v, so that p_, = p,, and p, = 0 
when |v| > h (see, for example, Hannan, 1960, pp. 40-1). Hence 


n & " 
L, = —}klog 27 — flog | W| “? 2 ied (r;—p3), (6) 
where W-! = (w*) is the inverse of W, so that 
OL, 1 30|W| xn 


Owti | aes k 2 
—— me = a w*(r,—p;)—-2 wr, 
Op, 2] Op. 312, = (r;- Pid) Ops “a vi Pj) 2» (r; pi) 2s r| 
(s =1,2,...,h). (7) 


Now let us assume that the equations obtained by putting these partial derivatives equal 
to zero have a consistent solution, i.e. one that converges in probability to p = (p1, Pe, ---, Pr)’ 
when n - 00. The difference between this and ¢ will then be of order n-+ (as can be seen, 
for example, by applying the mean value theorem to (7)). It follows that an asymptotically 
equivalent solution 


Oo) = (p® » Pe, .. -» py?) , 
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is obtained by neglecting the first two terms on the right-hand side of (7), giving 
h k 
> w(r,-pP)=- Y Wr, (8 =1,2,...,h), 
i=1 i=h+1 


where #* is obtained from w* by replacing p by 6”. Thus if (@,;) is the inverse of the matrix 
consisting of the estimated first h rows and columns of W-! 


h k 
pP=r.+ DA; YD wr, 
j=1 s=h+1 


k 
=r,+ > @,r, say (§=1,2,...,h). (8) 
s=h+1 


These equations may be solved by iteration, the mth approximation 
eX® = (08, MP, ..., phy 
to 6 being determined from the (m—1)th by 


k 
*(k) _ #(k 
Pie =7 + DY CilOm”s) Te, 
s=h+1 


and the first approximation being taken to be pj = r;. The values of the coefficients 
¢,(e) for any particular p are most conveniently determined from the set of equations 


k 
> C;5(P) W,;(2) — — W;;(e) (j =h+l, .++) k) (9) 
8s=h+1 


which they clearly satisfy. It should be noted that the formula (5) may be rewritten 
Wig = Ag g + Ass + 2(PsPjAg — PiAy — P5Ai)> (10) 
where the A, are defined by 
h ; : ! 
f + DY p,(2*+ =| = D> A,2. 
i=1 i=—2h 


It is easily seen that 6 is a consistent estimate of p, and is asymptotically normal, the 
asymptotic distribution of ,/n (p —p;) being the same as that of 


k 
dn{r- pet D> cule) 
s=h+1 


provided that the functions c,, are bounded for all p which form a set of possible auto- 
correlations. That the ¢,, satisfy this condition follows from the equations (9), which show 
that they are rational functions of p with denominators which could be zero only if the 
determinant |w,,(e)| (8,7 = h+1,h+2,...,k) were zero. (For |w,;(e)| = 0 implies that some 
linear relation between the z, holds with probability 1, which is clearly impossible for any e.) 
The equations (9) moreover state that 


k 
r+ D ¢,(p)r, (¢...1,2,...,4) 
s=h+1 


are asymptotically uncorrelated with (and therefore independent of)r;(j = h+1,h+2,...,k) 
so that we may write . 
ret ZY CiglO) 7s = Ve- (Tras Taser +> Tk)» 

s=h+1 
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the ‘residual’ obtained by subtracting from r; its (asymptotic) regression on 1p..4,1p49> +++5 Te 
Thus we have shown that p is asymptotically equivalent to r;. (7141) 7n+9> -++» 7%)» It should 
be noted that we have not needed to justify the assumption that the original maximum 
likelihood equations have a consistent solution, since we can suppose that the estimates 
are defined by the equations (8). 

In applying the iteration method in practice it will usually not be worth while altering 
the arguments of the functions c;, after the first two or three iterations. In fact if é is any 
consistent estimate of p the asymptotic behaviour of the quantities 


k 


rt x C;5() Ts 
s=h+1 


is the same as that of the fp”, and the chief reason for altering the arguments is to improve 
the accuracy of the asymptotic distribution as an approximation for finite n, one of the 
sources of error in the approximation being the deviation of the coefficients of r, in the 
estimators from the corresponding coefficients in the true residual 1;.(7p41,7h+99 «++ 7%) 
Once the arguments of the c,, are fixed, the solution of the equations (9) can be easily ob- 
tained even when there is some doubt as to a suitable value of k. (The choice of k is discussed 
in §3; clearly there may be a considerable loss of asymptotic efficiency if kissmall.) For the 
method of triangular resolution of the matrix (w,;), whereby a lower triangular matrix 
L = (1,3) such that j 
Wy = DY Luly (8 > J =h+1,h+2,...,k) 
i=h+1 
is determined, may be used, and an element 1;; depends only on the values of w,y for t’ < 4, 
j' <j, so that if it is desired that the value of k should be increased, the values of /;; already 
computed do not need to be altered. (Compare Wold, 1949, who requires the triangular 
resolution of this same matrix.) It will often suffice to use a small initial value of k to deter- 
mine the estimate, 6 say, which is to be substituted for p in the equations (9), giving the 
coefficients w,;(6), w,;(¢) which are to be kept fixed for the remainder of the calculations. 
As an illustration of the method, consider its application to one of the series of 100 
observations used by Durbin (1959, §8), which were generated by the model 


a, = & + E. 
For this series the first five sample serial correlations were: 


r, = 035005, r,=—0-06174, 1, =—0-08007, r,=—0-14116, 1, = —0-15629. 
For a first-order process the matrix W is 


1—3p?+4p* 2(1—p?) p? 0 0 

2p(l—p*) 1+2p% 2p Fi 
p? 2p 1+2p? 2 p® . (11) 

0 p? 2p 1+2p? 2p 


(writing p for p, without ambiguity). Hence with k = 2, the first iteration gives 


2r,(1—1?) 
#2) — 7 — eae )r = 0°38051. 
Pa ’ 14272 } * 


The next iteration gives 
(2) __ f p#(2)\2 
pa? =r,- (* “C= {05"} ) r, = 0-38121. 





1+ 2{p3?}? 
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This differs from p}® by a very small fraction of its asymptotic standard error, so that it 
does not matter which of the two estimates of p is taken as /. (The value of the asymptotic 
standard error, calculated from formula (16) of § 3, is 0-053; this, incidentally, is little more 
than 10 % of the true value of 0-4 of the parameter p, so that the choice of k = 2 as an initial 
value is reasonable.) 

With 6 = 0-38051, the equations (9) become 


1-28958 0-76102 0-14479 0 . 0-65083 
0-76102 1-28958 0-76102 0 . ietie 0-14479 
0-14479 0-76102 1-28958 . . sae 0 


The solutions of these for k = 3, 4, 5 are respectively 

c’ = (—0-67271, 0-28471), (—0-73184, 0-41599, — 0-16332), 
and (—0-75192, 0-46547, — 0-24462, 0-09210). 
The corresponding estimates of p are 


p® = 0-36879, p® = 0-38498, fp = 0-37934. 


3. ASYMPTOTIC EFFICIENCY OF THE ESTIMATES 


Let v? be the variance of the limiting distribution of ,/n (A — p,). Since this limiting dis- 
tribution is the same as that of ./77;. (T4345 7h+2) +++» 7%), We have 


k 
{k) —. . 
Vi? = Wix-  L CisViz 
8s=h+1 
k , 
nn &8s 
=Wi- DL Wiig M5, (12) 
8,8’=h+1 
where the matrix {a{5} is the inverse of 


Whrii,htl pa Whit,k 
Aw = : ’ 


Wynsi so> Ve 
| . ; ) 
or Wy; Wi h4t ++ Wiz 
i) — | Weare | 
|Agy| of? =| “hte | (13) 
: Ay, 
Wri 


Clearly v# is a non-increasing function of k, 1 — {vf?/w,,} being the square of the asymptotic 

multiple correlation coefficient of 7; ON 1)41,7p42)-+-»7%, and therefore lim v? = v,, say, 
k> ow 

exists. In fact n-!v,; (¢ = 1,2,...,h) will be the asymptotic variances of asymptotically 

efficient estimates of p,, Po, ...,P,. Similarly the asymptotic covariances of asymptotically 

efficient estimates will be n—1v,; (i + j), where 


v4; = lim vf) 


ko 


k 
8,3 =h+1 
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the asymptotic covariance of 1 7;. (Tayi Taogs <--> Te) ANd N75. (Th 415 Tie --->%e): Explicit — The 
expressions for these asymptotic variances and covariances can be obtained by usinga |  efficier 
formula for the limits of elements of the variance-covariance matrix of consecutive obser- |  efficier 
vations from a moving-average process given by Walker (1961). For w,; = A;_; wheni,j > h | inform 
(from equation (10)), so that A, is equal to the variance-covariance matrix of k—hcon- | 8 the 
secutive observations from a moving-average process whose covariance-generating function | their a 
is h 2 h 2 h 2 ho \2 _ 
(1+ d pile +2} = (1+ > 2] (1+ > fz") [(1+ > ft) . _  & requ 
r=1 r=1 r=1 r=1 predict 
Hence by the lemma in Walker (1961), if | theory 
Ag} = {fahto*+} (i,j = 1,2,...,4—-h), | ees 
Toi 
; h 2 min (i,j) ° 
a4 = limajytrt = (: + > pr) Bb %4- this we 
k>o r=1 r=1 
h 2 Ta 
where (1+ > 2] = 4,2 [for |z| < 1) 
r=1 r=0 
h _\2j-1 
or, taking 7 > j, ai= (1 +> ft) DY 6,5 ,44-7+ (15) 
r=1 r=0 
Also w;,; = 0 if |i—j| > 2h, so that from (12) and (14), 
Qh+i 2h+J ; 
WM =wy- LY YL WiWjea%5 whenk > max (i,j) + 2h, (16) 
s=h+1 8’ =h+1 
h+i h+j 
and therefore = Wy- DL LD Wen nea”. (17) 
r=1 r’=1 
It should be possible to verify that (v,;) is the inverse of the information matrix for 
Py Pa» -++>Pp given by Whittle’s method. However, there seemed to be no way of obtaining For a § 
manageable expressions for the elements of the information matrix, the constant terms ii 
in the Fourier expansions of argest 
Lologg(w) Plog g(w) | the san 
2 Op; Op; The ma 
for an arbitrary value of h. When h = 1, the verification is easily performed. For then the gray 
probab! 
ul — (1 2 gl? = —28(1+62)2, a = (1+4f2) (1+ A2)2, 
av=(1+f%)?, a B+ By, a% = (1+ 46%) (1 + A) aan 
so that Oy = Wy — (1 + B?)? {wig — 4804 2W 43 + (1 + 46?) wis} 
= 1—3p? + 4pt— p*(1 + 6)? {4(1 — p*) — 8pA(1 — p*) + p>(1 + 4A}, i 
is —(- 
which with p = f/(1+?) gives mn . [p|H < 
Vy = (1— fF /(1+/?)*. (18) lees tha: 
log g(w) et e-& In ge 
Al = ie 8 
wi op 1+2 get 1+ fe“ of the ri 
so that the constant term in the expansion of }[{0log g(w)}/0A]? is (1 — 4?)-* (ef. Whittle, given k, 
1953, p. 430). The constant term in the expansion of 4[{0 log g(w)}/dp}* is therefore k+1 ter 
2p\-2 easily se 
(s5) (At = (1+ A/C APP, 


the reciprocal of (18). 





18) 


tle, 


Large-sample estimation of parameters for moving-average models 351 


The asymptotic efficiency of # is thus equal to v,,/v. If an overall measure of asymptotic 
efficiency is required, the ratio Pull |x| may be used. When k is such that these asymptotic 
efficiencies are high (say not less than 0-80), little is to be gained by taking account of the 
information on ep contained in the sample serial correlations for lags exceeding k, especially 
as the accuracy of the approximation of the asymptotic distribution of the estimates to 
their actual distribution for a given finite value of n may be expected to decrease as k 
increases. When prior information on p is available, this can be used to predict the value of 
k required to give estimates of high efficiency. In the absence of prior information, the 
prediction can be based on the estimate of p obtained with the small initial value of k. (In 
theory, of course, the value of k should not depend on the observations themselves, but it 
seems unlikely that this will matter in practice.) 

To illustrate the variation of asymptotic efficiency with k and pe, a number of values of 
this were computed for the case h = 1, and are given in Table 1. 


Table 1. Asymptotic efficiency E,(p) of p™ for the first-order moving-average process 


\k 1* 2 3 4 
lpIN 

0-10 0-960 0-999 1-000 1-000 
0-20 0-832 0-984 0-999 1-000 
0-25 0-732 0-958 0-995 0-999 
0-30 0-604 0-904 0-981 0-997 
0-35 0-451 0-801 0-943 0-985 
0-40 0-278 0-617 0-834 0-937 
0-425 0-190 0-480 0-723 0-869 
0-45 0-107 0-313 0-541 0-725 


* For comparison the asymptotic efficiency Z,(p) of 7, is also given. 


For a given value of k the asymptotic efficiency decreases as |p| increases, as one would 
expect. As k increases the rate of decrease becomes very small except when |p| is close to its 
largest possible value 0-5 (corresponding to |/| = 1), and the information on p contained in 
the sample serial correlations for lags exceeding 3 or 4 is negligible except for |p| > 0-4. 
The main factor determining the value of E,(p) seems to be the magnitude of |A|*+1, and 
the graphs of £,(p) plotted against |4|**+1 for k = 2,3, 4 are very similar (see Fig. 1). This is 
probably connected with the fact that the remainder after k + 1 terms of the autoregressive 
representation 


x(- — py mH, = & (19) 


is —(—/)*+1e_,_,, whose standard deviation is |f|*+1o (cf. Durbin, 1959, p. 307). If 
|A|*+2 < 0-05, it should certainly be safe to conclude that the asymptotic efficiency is not 
less than 0-80. 

In general the asymptotic efficiencies H,,;(e) = v;;(e)/v?(e) will decrease as the moduli 
of the roots z; of the equation z’ + £,z"-1+...+, = 0 increase. It is conjectured that for 
given k, the values of F;,,(p) will depend mainly on the magnitude of the remainder after 
k+1 terms of the autoregressive representation which is the generalization of (19). It is 
easily seen that this remainder term can be written as 


z 2 &- dar(d Bb; Vere) (20) 
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h rs) 
where (: +> p.#) l= Sy,2". The standard deviation of (20) is 
r=1 r=0 
hfjh 2)4 
{> ( >> Bsvern-) , 
i=1\j=1 


and if this is small (say, less than 0-05), we should expect the E,,(p) to be near unity. 
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Fig. 1. Variation of Asymptotic Efficiency Z, with |A|*# 


It should be noted that the formulae (12) and (14) at the beginning of this section will still 
apply when k is allowed to tend to infinity with n in a suitable way instead of being fixed. 
We then have estimates which are asymptotically efficient, equivalent to those obtained by 
Whittle’s method. In this case a convenient computational proce dure would be to calculate 


successive estimates 
eo = (pr, pz™, soos Pee)’ (k z, 2, 3, oom 3 


k 
from the formula pre=r,+ Y ¢,(e**-»)r,, (21) 
s=h+1 
starting with p*® = (r,,72,...,7,)’. This procedure might in fact be regarded as a satis- 
k 
factory substitute for the determination of estimates of /,, £2, ...,8, by minimizing Y a,C,. 
k 


s=— 


Incidentally it is possible to verify from (21) that the distribution of the estimates pj“ is 











since 


is asy 


23 
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asymptotically normal by showing that the limiting distribution of ./n(p?—p,) is the 
same as that of k 
imrn-pit+ cule), (z 
s=h+1 


the normality of which can be established by an extension of the argument used by Diananda 
(1953, pp. 240-1). However, the chief advantage of the present approach is that it enables 
one to consider any particular finite value of k, and so to avoid the difficulty of deciding on 
the rate at which k should tend to infinity. 


4. ADJUSTMENTS FOR BIASES OF THE ESTIMATES 

It is well known that most large-sample estimates of parameters in time-series models 
which are functions of sample serial correlations (or have the same asymptotic distributions 
as such functions) are subject to biases which may not be negligible even for samples con- 
taining several hundred observations. This seems to be the case in the present problem but 
fortunately approximate allowances for these biases (which will be of order n-!) can be 
made fairly easily for estimates equivalent to those given by the equations (8). To do so 
one uses the formula for the leading term in the expression for the bias of a sample serial 
correlation obtained by the usual series expansion method. Writing 


n{E(r;) — pi} = v, + 0(1), 
we have Vv; = 2(p;Ayg—A,) (22) 
when the process mean y is specified a priori, and 


v= 2p: A)+(A:-V)_E Pr (22a) 


=—o 


when v is not specified, and is replaced by the sample mean = (see, for example, Lomnicki 
& Zaremba, 1957, p. 156). 


For k) 7 : 
Expy }— Pi = or; + D> (¢;.(e) + 8¢,,) or, ? 
s=h+1 
where or; = Pe 0c;, = ¢,.(0) —¢;,(e), 
which to order n- is equal to 


k k 
naly+ x ¢s(e) »} +B sx b,,8r) 
8 s=h+1 


=h+1 


oc 


4 


, k 
Str,+ Cc rT), 
op; ( - es ale) ) 


neglecting terms which are o(n-*) (in probability), so that 





Also b0,, = 3 M8(p—p) = ¥ 
- j=1 OP; ie ' j=1 


B > be,,8r) = o(n-) 


=h+1 
. k . 
since ei 2D cule) n (j = 1,2,...,h) 


is asymptotically uncorrelated with r, (s = h+1,h+2,...,k). Thus 


ERP)—p.= {n+ ¥ cule) | n+ 00, (23) 


23 Biom. 48 
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An allowance for bias, to be subtracted from p™, will then be given by the estimate 


(vem) + > cal) re) nt (24) 
8=h+ 


of the right-hand side of equation (23). The adjusted estimates therefore become 


s=— 


k h k 
pe —2{peaam)—a(a)— > cul) 216) n4[ > pe {pp —1— > cul@™)}| (25) 
s=h+1 L s=h+1 


the term in the square bracket being included or omitted according as the v; are determined 
by (22a) or (22). 
5. NUMERICAL EXAMPLES 
The estimation process was applied to each of the twenty first-order series of 100 obser- 
vations used by Durbin (1959) to illustrate his method, and referred to in §2. The value 
k = 5 was used in equation (8), and the coefficients c, were calculated using estimates of p 
obtained with the critical value k = 2, so that 


5 
p a | > 2 cs) Ts (26) 


2r,(1—73) 
14273 } ? 


where p="- ( 





The effect of using an estimate f based on more than the first two sample serial correlations 
was investigated for the first five of these series, and found in all cases to be a negligibly 
small fraction of the asymptotic standard error of #. The valuesof fp and # ={1 - (1 —4/2)#}/2 
(the solution of the equation # = (1+?) having modulus less than unity) are given in 
Table 2. 


Table 2 
Series p B Series p B 
1 0-37934 0-4594 ll 0-40759 0-5162 
2 0-33968 0-3918 12 0-40725 0-5155 
3 0-33104 0°3785 13 0-41694 0-5373 
4 0-40458 0-5097 14 0-40761 0-5162 
5 0-39223 0-4842 15 0-26668 0-2889 
6 0-42798 0-5642 16 0-36176 | 0-4280 
7 0-41695 0-5373 17 0-34255 0-3964 
8 0-38212 0-4646 18 0-44588 0-6140 
9 0-40868 0-5186 19 0-35327 0-4138 
10 0-34431 0-3992 20 0-41163 0-5252 


The means and variances (with divisor 19) of these sets of values are 


A 


p B 
Mean 0-38240 0-4730 
Variance 0-001826 0-006042 


The large-sample variance of f is given by 
1000 (A) = vf(0-4) 
= 0-6224 — (0-7849 x 0-6720 — 0-5110 x 0-1600) = 0-1767 
(using the formula at the beginning of § 3), and that of # by 


-2 
100V(f) = 100V (A) (3), 


= 0-1767 x (1+25)4/(0-75)? = 0-7669. 
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(For £ = , the limiting value of v,, given by (18) is 0-1728, so that the asymptotic efficiency 
of p or # is 0-1728/0-1767 = 0-978.) The observed variances are in good agreement with 
these large-sample variances, but the deviations of the observed means from the theoretical 
values p = 0-4, 8 = 0-5 are rather large, the ratio of the deviation toitslarge-sample standard 


error being —0-0176/0-00940 = —1-88 for f 
and —0-0270/0-0196 = —1-38 for P. 


This suggests that the estimates are subject to negative biases which are not negligible. 
The allowances for bias of the estimates # given by the equation (24) will be approximately 
equal to the value of (23) when p = 0:4 (with i = 1), ie. 


10-*{2(0-4 x 1-32 — 0-8 + 0-785 x 0-16) + 1-8(— 0-6 + 0-446)} = 0-0057 
(the values of the c; when p = 0-4 are 
Cy = 0°7849, cz =—0:5110, c,=0-2798, c, = —0-1076, 
so that then Xc; = 0-446). 
This is rather more than half the large-sample standard error, and if we made the adjust- 
ment for bias suggested in § 4, the ratio of the deviation / — 0-4 to its large-sample standard 


error would be reduced to approximately —0-0119/0-0094 = —1-3. Similar allowances 
could be made for the biases of the estimates /, using the approximation 
p—p = (B—A)f'(b) + 4(P- BY f"(A), 

where f(A) = BIA +P), 
so that f'(b) E(B — B) = E(p—p)—(1—f) f"(A)|2n. (27) 

For £ = 0-5, n, = 100, this gives E(B—) + —0-020, so that with these allowances the 
deviation # — 0-5 would be only about 0-007, less than half the large-sample standard error. 

It is interesting to compare these results with those obtained by Durbin’s method 
(1959, §8). His values of f are consistently smaller than the corresponding values in Table 2, 
their mean being 0-4531 compared with 0-4730, and the deviation of this from 0-5 is well 
over twice its large-sample standard error. This suggests that the present method of 
estimation is subject to less bias. It is, however, not at all clear why that should be so; 
certainly Durbin’s method involves two estimation processes into which bias may enter (the 
estimation of the parameters of the approximating autoregressive process and then the 
estimation of £ by a function of these estimated parameters), but so does the present method. 
A partial answer to the question can be given by obtaining an approximation, to order n=", 
to the bias of Durbin’s estimates by a method similar to that used in § 4, and comparing this 
with the corresponding approximation to the bias of the present estimates derived from the 
right-hand side of equation (27). The approximation, which is derived in the Appendix, is, 
however, anextremely complicated function of #, and, because of the laborious computations 
required, the comparison was made only for f = 0-5, k = 5. In this case, the approximation 
to the bias of Durbin’s estimates becomes — 2-2n-! compared with —2-0n-1 for the bias 
of the present estimates, so that there seems to be little to choose between the two methods 
as regards bias. Hence it is probably fortuitous that for the twenty series of 100 observations 
the present method has given so much better results. It is incidentally interesting to note 
that the deviation of the mean of Durbin’s twenty estimates from expectation is reduced 
to 0-025 when allowance is made for bias using the above approximation, the ratio of this 
to the standard error of the mean being only 1-3. 


23-2 
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The approximation in the Appendix can of course be used to adjust Durbin’s estimates 
for bias, but the complexity of the formula makes it somewhat inconvenient for practical 
application. 


I wish to thank Mr J. Durbin and Miss J. May for providing me with the serial correlation 
coefficients for the twenty first-order series referred to in §5, and also Mr D. A. East and 
Mr K. A. Tucker for undertaking most of the computations. 


APPENDIX 
Approximate adjustments for biases of Durbin’s estimates of £ for a first-order moving-average process 
k-1 k 
Let 8 = — 2 a,a;,, / x a?, where the a, (i > 0) are the estimates of the coefficients a, of the approxi- 
i=0 i=0 
mating autoregressive process (Durbin, 1959, equation (7)), and a, = 1. Putting da; = a,;— a, and using 
Bai t+(L+h)a,+fa,,=90 (¢=1,2,...,), 
with a = 1, a,,, = 0 (Durbin, 1959, p. 307), we obtain 
k k-1 k k \-1 
= (0+ (1— Br) PE aba, — (1B) E BaySay,4) (1+ 2(1—f%) ¥ a 8a,-+(1—f%) % (80,)') (A) 
i=0 i=1 i=1 i=1 
k-1 k 
assuming k to be large enough for © «,a,;,, to be replaced by £/(1—*) and & a? by 1/(1—?). 
i=0 i=0 


On expanding the right-hand side of (A 1) and retaining only terms whose degree in the da, does not 
exceed 2, we find that to order n-!, 


k k k k-1 
P—B = (1-628 E a4 8a,—2(1 — pp > aba) —(1 ~f) (a E (a,)*+ D da,80,..) (A2) 
i=1 i=1 i=1 i=1 


Now if a’ = (a, d,,...,@,), UW’ = (71,79) ---»7,), the 7, being the serial correlation coefficients defined in 
§1, and S = (8,,;) = (r;_;) (4,9 = 1, 2,...,&) (with rp = 1), we have 


a’/S=-u’ or Sa=-u. 
Let 6S =S-Z, where 2 = (o;;) = (p;-;), 
da=a-—a@ and du’ = u’—(pj,pg,..-,Px)- 
Then 6Sa+2da+6dSda = —du, (A3) 


assuming that the autocorrelations of the moving-average process can be taken to be equal to those of 
the approximating autoregressive process, so that a’Z = —(p, 2, ..-,P;).- From (A3), 


éa = —Z-1(du+dSa+dSda) 
= —ZE-1(du+ dSa—dSE-'"du + 6Sa}) + o(n-1). (A4) 
Hence using the approximations 
E(r,;—p,;) = nv, (equation (22), § 4), 
EX(r;— pi) (75 — ps)} = NW ys, 
where w,, is defined by equation (5), § 1, and taking w,; = 0 if either i orj = 0, we obtain the approxima- 
tion 


k k k k 
E(éa;) = —n-1 by oily, »Y Vig hy — po OT Wij—41,0 — BY ryan} 
j=1 r=1 r,8=1 r,8,t=1 


= —n-%,, say. (A 5) 


Also using the expression for the asymptotic covariance matrix of the a, due to Mann and Wald (see, 
for example, Durbin, 1959, p. 307 or Hannan, 1960, p. 50), we may take 


E(6a,6a;) = n—(1+ 6?) 0, (A6) 
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Thus from (A 5), (A6) and (A an approximation to the bias of B is given - 


—nB(P—f) = (1- pre Ba, t+ (55 aR) (2e00- p?)? z a,2,04+8 Dott Eovnl, (A7) 
=1 = 


B . i,j i=1 i= 
The right-hand side of (A 7) with f replaced by / will then provide an adjustment for bias. 
Computation of this right-hand side for 8 = 0-5, k = 5, yielded the value 2-15. Hence for the series 
considered in § 5 the adjustments for bias to be applied to Durbin’s estimates should be about 0-02. 
It was Saees by Durbin that some improvement might be obtained by changing the denominator 
k-1 


in f from $ a? to 2 a?, It is easily seen that since «, is very small, the effect of this change on the approxi- 
=0 =0 


mation to the bias i is very nearly equivalent to subtracting a term (1 — £?) (1+ £?)-1 0** from the right- 
hand side of (A 7). For £ = 0-5, k = 5, this term is equal to 0-37, so that the approximate bias becomes 
1-78n-1, a reduction of nearly 20 %. 
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Linear and non-linear multiple comparisons in logit analysis 


By OLAV REIERSOL 
University of Oslo, Norway 


1. BERKSON’S MINIMUM LOGIT CHI-SQUARE METHOD DERIVED FROM NEYMAN’S REDUCED 
CHI-SQUARE AND LINEARIZATION METHOD 


The minimum logit chi-square method of estimation has been developed and discussed 
by Berkson in a series of papers (Berkson, 1944, 1951, 1953, 1955). Taylor (1953) has shown 
that the estimators obtained by this method are regular best asymptotically normal 
(RBAN) estimators.* The same result can be obtained by showing that we may get 
Berkson’s method of estimation as an application of Neyman’s y? and linearization 
method (Neyman, 1949). What Neyman denotes by x? has been called the reduced chi- 
square by Fix, Hodges & Lehmann (1959). 

We consider a sequence of independent experiments divided into s groups. Let n; be 
the number of experiments in the jth group. In each of these n; experiments a non-stochastic 
variable x is given the value z;. In each experiment we note whether an event A occurs or 
not. In biological assay the variable x will be some function, for instance, the logarithm, of 
the dose or intensity of the stimulus applied to the experimental animal. The event A may, 
for instance, be the death of the animal. We suppose that the probability of A has the same 
value p; for all experiments in the jth group. We suppose that 


1 


Pi Ty eaten? (3) 
from which we get inversely a+ Px; = ae . (1-2) 
ee | 


Let q; be the relative frequency of the event A in the jth group of experiments. Expanding 


log {p,/(1—,;)} in a Taylor series about p; = g; and retaining only terms which are linear in 
p; we get 











Pj q ! 
lo /_=lo + ;— Qj). 1-3 
Using (1-2) this may be written in the form 
Pi— 9 = (1-95) (x + Bx; — uy), (1-4) 
= qj 
where u; = log ee 


Neyman’s reduced chi-square is in this case 
 2i(P5— 95)" 
j=1 9j(1—9;) © 
If we minimize (1-5) subject to (1-2) we get non-linear equations for the estimators. We shall, 
however, replace (1-2) by the linearized form (1-4). Substituting (1-4) in (1-5) we get 


(1-5) 


z 0;9;(1—9;) («+ Ba; —u;)*. (1-6) 


* This concept was introduced by Neyman (1949) who used the term ‘best asymptotically normal 
estimates’. 
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Neyman’s reduced chi-square and linearization technique applied to the present case thus 
means that we shall use as estimators a and 6 those values of « and £ which minimize (1-6). 
We have thus arrived at Berkson’s minimum logit chi-square method. 

It should be noted, however, that Berkson has recommended a modification of 
this method which consists in replacing each g; = 0 by q; = 1/(2n;) and each q; = 1 by 
q; = 1—1/(2n,) (Berkson, 1955, pp. 152-7). 

This modification does not change the asymptotic properties of the estimators. The 
statements of the present paper will be valid whether we consider Berkson’s original 
estimation method or the modified method just mentioned. 

If we do not assume (1-1) a priori, we may regard it as an hypothesis to be tested. Neyman 
(1949) has presented several alternative tests each of which is equivalent in the limit to the 
likelihood ratio test. We shall consider the test criterion 


¥7;9;(1 —9;) (a+ bx;—u;)* (1-7) 
w] 


which has asymptotically a chi-square distribution with s— 2 degrees of freedom when the 
hypothesis (1-1) is true. A test based on the test criterion (1-7) is not a special case of any 
of Neyman’s tests. It may, however, be shown that it is asymptotically equivalent to 
Neyman’s tests. 


2. A CHI-SQUARE TEST FOR DIFFERENCES BETWEEN SEQUENCES OF EXPERIMENTS 


We shall consider k sequences S,,S,,...,S,, of experiments where each sequence is of 
the type considered in the preceding section. The sequence S; is divided into s; groups. In 
the jth of these groups x is given the value 2,; and we perform n,;; experiments in each of 
which the event A is supposed to occur with probability 


1 


Pi = Leathe)” (2-1) 


Let g,; be the corresponding relative frequency. We then get RBAN estimators of the a; 
and £; by minimizing 





k 3% 
XY wie, + FX; — Uy)", (3-3) 
dank fork 
h _ 2) =] dis 
where We = Mj %i(1—Gj), Wig = logs. 
— Vij 


u,; is infinite when q,; = 0 and when q;; = 1. In both cases w,; = 0. In such cases we may 
use the procedure advocated by Berkson and replace each q,;; = O by q;; = 1/(2n,;) and replace 
each q;; = 1 by g;; = 1—1/(2n,;). Setting equal to zero the derivatives of (2-2) with respect 
to the a; and £,, we get the following equations for determination of the estimators a; and 6,: 


MA; +M4,b; = » Wij Viz, (2-3) 
My,4; + Mg,b; = 2 Weg ssXips (2-4) 
where Mo; > Uy My; => 2 Wy Xess Mo; = Dwishis- 


Let us consider the hypothesis 


H=%=...=a,=a, J, =f=...=f, =f. (2-5) 








By 
enc 


wit 


If y 


the 


anc 


Th 


Sin 














Linear and non-linear multiple comparisons in logit analysis 361 
By analogy with the tests of Neyman (1949) we may use a test criterion which is the differ- 


ence between the minimum of 
Dw (a+ paz — uy) (2-6) 
t, 


with respect to « and £, and the minimum of (2-2) with respect to the «; and £;, which is 
U7 
If we set My = VW, My = YW yXjz, My = YW; X?; 
i,j i,j i,j 
the values a and b of « and f minimizing (2-6) will be determined by the equations 
MyA +m, b = ¥ w,;U;;, (2-8) 
i,j 


1, 


. . "g Y- : M4 
and the minimum of (2-6) will be Y w,j(a + bays — w,,)?. (2-10) 
i,j 


The difference between (2-10) and (2-7) is equal to 
2 wy(a;—a+ (6; —6) x;;)?, 
which is again equal to 
~ {m;(a; — 4)? + 2m,;(a; — a) (b; — b) + mMo,(b; — 5)*}. (2-11) 
This test criterion has asymptotically a chi-square distribution with 24 — 2 degrees of free- 
dom when the hypothesis (2-5) is true. Let us use the term ‘accept’ as a synonym of ‘not 


reject’. Let C, ,, be the upper e-fractile of a chi-square distribution with k degrees of freedom. 
Then if we accept the hypothesis (2-5) when 


X {mMoi(a; — a)? + 2m, ,(a; — b) (6; — 5) + mg,(b; — b)?} < C, 242, (2-12) 
there is a probability asymptotically equal to 1 —¢ that we shall accept (2-5) when it is true. 


The test described is asymptotically equivalent to the likelihood ratio test. 
Adding equations (2-3) for 7 = 1, 2,...,4 and subtracting (2-8) from the result we get 


X Moa; — 2) + Ym ;(b; —b) = 0. (2-13) 


Similarly we get x M,,(a; —a) + Y me,(b; — 5) = 0. (2-14) 
i i 


3. LINEAR MULTIPLE COMPARISONS 


If we reject the hypothesis (2-5) we shall usually be interested in knowing which a’s 
and f’s are different. We may also wish to compare two p-values corresponding to the same 
value of x in two different sequences of experiments. We may thus wish to test the hypotheses 


B; = B;, (3-1, 1) 
a; + P,x = a; + Bx. (3-2, aja) 
We may do this by means of an extension of the Scheffé multiple test (Scheffé, 1953, 1959). 
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Let z be a real column vector with n elements and let B be a real symmetric matrix which 
is positive definite. The Scheffé multiple test is based on the equivalence between the single 
inequality /B2 <2 (3:3) 
and the set of all inequalities 


|h’z| < c(h’Bh)} for all real column vectors h. (3-4) 


We shall first prove that (3-3) and (3-4) are equivalent when z, B, h, c are all constant. 
We shall give a purely algebraic proof of this equivalence. 

Since B is positive definite, it may be written as a product A’A where A is a real square 
matrix. Then B-! = A-\(A-)’. If we set v = (A-!)’z and g = Ah, the inequalities (3-3) 
and (3-4) can be rewritten in the form or. (3:5) 
and \g’v| < c(g’g)* for all real column vectors g. (3-6) 


We shall now show that (3-5) and (3-6) are equivalent. Suppose first that (3-5) holds. Using 
the inequality of Cauchy (usually called the ‘nequality of Schwarz) we get 
(g'v)® < (9'g) (v'v) < e(9'g) 
which gives (3-6). 
Suppose next that (3-6) holds. Then we may set in (3-6) g = v and get 


v'v < c(v'v)t 


from which (3-5) follows. The equivalence is thus proved. 

Suppose next that a sample space M is given. Let z, B, and c be random variables on the 
space M, and let H be a family of random variables h on the space M. Applying the equi- 
valence between (3-3) and (3-4) to each point X of V/, we find that if the inequality (3-3) 
holds at X, then the set of inequalities 


|h’z| < c(h’Bh)* for all h belonging to H (3-7) 
holds at X. Hence the set of points {X: z'B-l2 < c%} (3:8) 
is contained in the set 
{X: |h’z| < c(h’Bh)} for all h belonging to H}. (3-9) 
If Pr {A} denotes the probability of the statement A we thus have 
Pr {|h'z| < c(h’ Bh)} for all h belonging to H} > Pr {z'B-z < c*}. (3-10) 


Suppose next that the family H is such that 


For any real vector k with n components and any point X of M there exists a 


random vector h of the family H which takes the value k at the point X. allah. 


Then (3-3) and (3-7) are equivalent at each point X of J/, hence the sets (3-8) and (3-9) are 
identical, and we have 


Pr {|h’z| < c(h’ Bh)? for all h belonging to H} = Pr {z’B-z < c}. (3-12) 


Evidently (3-12) still holds if (3-11) holds with M replaced by a subset of M of probability 
one. 

Since a constant is a particular kind of random variable, the statements above are still 
valid if some of the elements of z, B, c and the vectors of H are random and some of them 
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constants. If H contains all constant vectors with n components, then (3-11) holds, hence 
also (3-12) holds. 

What we have presented here represents an extension of the results of Scheffé in the 
following respects: 

(i) We do not assume that B is proportional to the covariance matrix of z. Actually, in 
the applications considered in the present paper, B is not proportional to the covariance 
matrix of z. 

(ii) If the elements of z are subject to linear restrictions, our proof is still valid, while in 
the procfs given by Scheffé the existence of the inverse of the covariance matrix implies 
that the elements of z are linearly independent. 

(iii) In the presentation of Scheffé the vectors h are supposed to be non-random. 

It should be noted that Scheffé (1959, pp. 273-4) has considered the case of a random 
matrix B. 

Using the results presented above and setting 


ot li —1 
* ft om | es ~~ at 
li 2; 
mt m* My; Mo; 


we conclude that at each point of the sample space the inequality (2-12) is equivalent to 
the set of all inequalities 


|X hy(a;— a) + hoi(d; —b)| < (Ci 2p-2 X (MAG, + 2M hy ho; + m*h3,))* (3-13) 
t + t 


for all real values of hy), ..., 41,5 oy, ---,4e,- Because of the linear relations (2-13) and (2-14) 
there is an infinity of different sets of coefficients h,; and h,; giving the same function of the 
a; and b;. We have, for instance, if k, and k, are arbitrary real numbers 


by — by = (1+ ky my, + kgmg,) (b; — 6) — (1 — ky Myq — ky Mo9) (6, — 5) 
k k 
- X (ky mye + kgmas) (b;—6)+ _ (ky Mg; + kgm,;)(a;—a@). (3-14) 


The corresponding sum on the right-hand side of (3-13) is 


m* +m + f(ky, ke), (3-15) 
k 

where f (ky, ke) = ¥ (Mg: k} + 24, ky hy + mg; k5) (3°16) 
i=1 


is a positive definite quadratic form. We thus get a set of inequalities 
|b, —ba| < [C.,2,-2{m™* + m™ + f(ky, k,)}}* (3-17) 


whieh belongs to the set (3-13). The set (3-17) is equivalent to the single inequality which we 
get from (3-17) when f(k,, k.) is given its minimum value which is 0. The set of inequalities 
(3-17) is therefore equivalent to the single inequality 


|b; —be| < {C,,o4-2(mm*! + m™)}4, 
More generally we get as special cases of (3-13) 
[b; — | < {C,,24-2(m™* + m™)}4, (3-18, 17) 


|a, + b,x — (a; +;x)| < {C,, 94 -9(m™ + ma + ma? + m4 + Wma + ma2)}4, (3-19, ix) 
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If for all values of i, j, and x we accept the hypothesis (3-1, 7j) when (3-18, ij) holds and 
(3-2, ¢jz) when (3-19, ijx) holds then, if the hypothesis (2-5) is true, there is an asymptotic 
probability greater than or equal to 1 —e that all the hypotheses (3-1, ij) and (3-2, ijx) will 
be correctly accepted for all values of i, j and x. This statement follows from an application 
of (3-10). 


4. NON-LINEAR MULTIPLE COMPARISONS 


We shall consider a particular kind of non-linear comparison which arises when we 
wish to compare the medians of tolerance distributions corresponding to different sequences 
of experiments (cf. Finney, 1947, Chapter 4), or more generally if we wish to compare 
fractiles of tolerance distributions. The value of x corresponding to a given probability 
p of A in the ith sequence is ‘ 7, 

P 
= {lo 7 -%) or =(y-—a@; 
B; ( af = By ad 
if we set y = log{p/(1—~p)}. We consider the hypothesis 
1 1 
7 Y-%) = 5 (y—a5) 
B; y 2) B; (y j 


which may preferably be written in the form 


(y —%;) Bj — (y —&;) B; = 9. (4:1, wy) 
In the testing of this hypothesis we shall use the test criterion 
Tijy = (y—4;) b; — (y — a5) B;. (4:2, wy) 


For any real number t we have the identity 
(y —4;) 6; — (y —a;) 6; = (ta; + (1 —t) a; —y) (6; —b;) + (tb; + (1—t)b,) (a;—a,;). (43) 


We know that the set of inequa'ities (3-13) follows from (2-12) also in the case when the 
h,; and h,,; are random variables. Setting 


—h,; = hy; = th; + (1 —t)b,, he; = —he; = ta;+(1 —t)a;—y, 
him = hom = 0 when m is different from i and j, we get from (3-13) 


Ziv < (C., ox-2 Fey (t))?, (4-4) 
where 


Gij,(t) = (m%™ + m%) (tb; + (1 —t) b;)? + (m** +m) (ta, + (1—t)a;—y)* 
— 2(m** +m) (tb; + (1 —t) b;) (ta, + (1 —t) a; —y). 


When i, j and y are given, (4-4) represents a set of inequalities, one for each value of t. This 
set is equivalent to the single inequality 


\Tyiy| < (C.,2x—2 min, G,,,(t))?. (4:5, ijy) 


Using again (3-10) we may make the following statement: If we accept the hypothesis 
(4:1, tjy) when (4-5, ijy) holds, and if we accept the hypotheses (3-1,7j) and (3-2, ija) as 
described in the preceding section, then, if the hypothesis (2-5) is true, the asymptotic 
probability is greater than or equal to 1 —¢ that we shall simultaneously accept the hypo- 
theses (4-1, ijy), (3-1, ij) and (3-2, ijx) for all values of i, j, x, y. 
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5. ADDITIONAL REMARKS 


There has been some controversy about the relative merits of Berkson’s minimum logit 
chi-square method and the maximum likelihood method in logit analysis. We may par- 
ticularly mention a paper by Silverstone (1957) and an answer to this paper by Berkson 
(1960). 

The methods of the present paper may also be applied if we use maximum likelihood 
estimators or another kind of RBAN estimators in the test criterion (2-11). The methods 
may similarly be applied in probit analysis if we insert the estimators in an approximate 
chi-square obtained by a linearization analogous to that used in the present paper. 
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Estimation of the normal population parameters given 
a type I censored sample * 


By J. G. SAW 
University College London 


1. IntTRODUCTION 


Of a sample of size n taken from a normal population with mean yw and standard deviation 
a, only those observations which lie numerically between pre-assigned values d, and d, 
(> d,) are available for statistical purposes. We suppose that a typical experiment yields 
the result x, < ... <4, <d, < 4441 < ... < &, < dy < %4, <... < x, 80 that s-r observations 
are available to us and that it is required to estimate ~ and o. Of course r and s are them- 
selves dependent random variables; in fact r and (s-r) have a trinomial distribution. The 
maximum likelihood estimators are investigated but prove bothersome to handle com- 
putationally. 

According to Gupta’s (1952) definition the sample above is described as being type I 
censored. An alternative form of censoring, defined in terms of fixed values r’ and s’ is as 
follows: of an ordered sample x, < x, < ... < 2, we observe (or make use of) only those 
variates x; for which r’ < i < s’. The problem of estimation given a type II censored sample 
has received much attention (Gupta, 1952; Sarhan & Greenberg, 1956; Plackett, 1958; 
Saw, 1958, 1959). It seems to have been assumed quite widely that having observed a type I 
sample we may deal with it as though we have planned, prior to experimentation, that 
exactly r observations should be missing from the lower end of the sample and n-s observa- 
tions from the upper end. Be this as it may, in dealing with type I censored samples as if 
they were type II censored, we seem to be ignoring the wealth of information stored in the 
experimental realization that r observations fell below d, and that n-s observations fell 
above d, and, in fact, in this paper we use this information alone to propose very simple, 
efficient (and indeed, familiar) estimators for w and o. 

If d, and d, are finite, then the sample is said to be type I censored above and below; if 
d, = —oo the sample is said to be type I censored above only. This latter situation is also 
considered in the following sections. Certain symmetries exist which may be observed if 
we multiply all observations by — 1, when we see that type I censoring above at d, corre- 
sponds to type I censoring below at —d,. Similarly a procedure which holds for a sample 
type I censored below at d, and above at d, will hold for a sample type I censored below at 
—d, and above at —d,. 

When 7 is finite there is a non-zero probability that s = r, so that no observations are 
available, and in this case it is not possible to give a point estimate for (~,o). In the finite 
sample case therefore a procedure has to be adopted should such an instance arise; for 
example, the experimenter may choose to augment his sample by taking a further n’ 
observations or to continue sampling item by item until one observation falls within the 
range d, to d,. This indeterminacy is of importance for small sample sizes in that the bias 


* The research reported here was supported in part by the Department of United States Army 
and was carried out at the University of North Carolina, Department of Bio-Statistics. 
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and variance of the estimators will be affected according as to what procedure is adopted. 
In the work which follows, however, we will be concerned only with large sample results so 
that the probability of a ‘null’ sample may be conveniently held to be zero. 


2. NotTatTIon 





: = 2 x 
Let fle) = sam "?(-3(s") | F(x) = | _ fu) du, (2-1) 
2(x) = aa exp ( = i='), P(z) = ie 2(u) du, (2-2) 
dj=ptot, d,=p+ot,, (2:3) 
F(d,) = P(t) =p, =1-q, F(d,) = Pte) = pp = 1-, (24) 
Y,=pt+ov,, ((=rt+l.,...,8). (2-5) 


The observed sample is (x,) < dy < 241 < ... < % < dy < (X43); let Ypiys Yosas -++> Ys 
be a random arrangement of 2,,, < 2,,. < ... < x, and write 


Y¥,=pt+ou; (@=r+l.,...,8). (2-6) 


We see that {v;} is an ordered set drawn from the standard normal population and satisfies 
ty < U4, <... < ¥, < t, and that {u,} is a randomization of {v,}. We will write 


Pr=7/n, p,=s/n. (2-7) 
It will be convenient to use z,=2(t;) (¢= 1,2) (2-8) 
8 
and where no limits are given, = is used as an abbreviation for >. . 
r+1 


3. PROBABILTY DISTRIBUTION FUNCTION OF THE VARIATES {u,} AND {v,} 





! 
Wehavethat = p(r,s) = | — (uaa PilPa— Pa” (1— Pa)”, (3:1) 


from which we may obtain moments and product moments of r and s. Also 


P(r+15 very UgyT, 8) = rl(n—8) 


1 2(V,41) ooo 2(v,) Pil —P2)"~*, (3-2) 


P(% 7,8) = tae oe ie (P(v;)— p,)** (p,— P(r,))**, (3-3) 


(s—r)! 2(v;) (P(r) — Py)*" (Pe — P(v,))** 





Pll’) = Gp e= ay (p.—P sie 

Since {u,} is a randomization of {v,;} (i = r+1, ...,8) 
P(Uppas +++ Ug]? 8) = 2(Up43) --- 2(Ug)/(P2— Pi), (3-5) 
p(u,|r, 8) = 2(u4)/(P2— Pr). (3-6) 


It is obvious from (3-4) and (3-6) that the variable u, is much more simple to deal with 
than the variable v; We shall make use of this by dealing with functions which are 
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symmetrical in x; (and therefore in v;) for then we may transform to the variates y; (and 


therefore u,;) for clearly P 
T%= dy ete. 
r+1 r+ 
4. MAXIMUM LIKELIHOOD ESTIMATES 
Let fi, ¢ be the maximum likelihood estimates of «4 and a. Since 
log p(%,+1, «++, %, 1,8) = log (n!/r!(n —s)!) —(s—r) log o ,/(277) 
— $2(x;—1)?/o? +r log F(d;) + (n—8) log (1—F(d,)), 




















g 1 d,) (d;) 
we have that 3 O8P = whe") — Fah +(n—s) ; oy 
a) a 8—r 1 d —p f(d,) d —p f (ds) 
gee = ——> + aol -9 gt @-9 Sire 


The likelihood equations are therefore 


2(t,) (n—8) 2(f,) 


Pé,) 1-P@é) ° 





1 A 
X(x;—")— 


xz 
Co 


A S(c gp Se, (n—8) ta2(tg) 


' PG) 1—P(é,) 


where i; =(d;-f~)/@ (j = 1,2). 


a (s—r), 





We may evaluate the expected value of the second derivatives for 
EX(u,-—p)/o = Xv, = Xu, = —N(z.—2) 
and similarly EX(x; — )?/0? = n( py— py —teZ_+t,%), 


so that we arrive at 
02 n 2 2 
—é—lo -=I( +3 ty24)—( -1 12), 
One SP = 7a \P2 a2 Pi p, 2 


0 n t,22 t, 22 
5 tage = 8 (atte 28) — (ores) 
+¢ 5000 — o | (a+ se 7 ‘fica il Pi 


in - 323 im 
6 Faloap = Z| (27. at Dat 2) —(2.-t@a-F) |. 





(4-1) 


(4-2) 


(4-3) 


(4-4) 


(4:5) 


(4-6) 
(4-7) 


(4-8) 


(4-9) 


(4:10) 


We may thus determine the variance-covariance matrix of f and @ for any values of p, 
and p,; elements in this matrix are given in Table 1 for various values of p, and p, and it 


may be noticed that 


(i) If p, > 0-50, then var (f2) varies little with p, provided that p, remains less than }. 
(ii) If p, < 0-50 and p,— 7, is fixed, then var (fi) decreases quite rapidly as p, closes to 


0-50 this being particularly the case when p,— 7, is small. 


(iii) If p,— >, is fixed, var (@) varies little and is minimal when p, = 0 or when p, = 1. 
For fixed p,—p, therefore we verify the already intuitive idea that the best estimates of 
7 will result when the tail observations are available and that the best estimates of y will 


obtain when the available observations are in the centre of the distribution. 
24 
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5. A LINEAR ESTIMATOR FOR 0 SYMMETRICAL IN 2,,1, --., Vs 


It will be conceded that the solution of the simultaneous equations (4-4) and (4-5) in pw 
and o present computational difficulties. In order to overcome this and to gain advantage 
from the simplicity of the joint distribution of the y; we investigate a linear estimator for 
o symmetrical in 2,.,,,,...,%,; in its most general form this will be 


1 
L(a: d,, dy) = ao(P) —, Lax; +a,(p)d,+a,(p)d,, |d;| < 0, (5-1) 


where a,(p) = a;(/,, P2) is a function of the observed ratios p, = r/n and p, = s/n. We may 
write (5-1) as 
1 
10% dy ds) = Mag(p) + 24(p)+0,(0)) +0 (ag(p) = Eu +ay(p)+teaa(o)), (5-2) 
so that 
Z_—@ 
&(L(o: dy, d,)|r, 8) = 4(aq(p) + 4,(p) +4,(p)) +7 (112310) + t,9(p) — ao(p) :) . (5:3) 
We take dg(p) + 44(p) + a4(p) = 0, (5-4) 


then L(c: d,,d,) will be unbiased to first order in 1/n if we take Z; = z(é;) such that P(é,) = p; 
(i = 1,2) and 





ta,(p) +#,4,(p) —a,(p) 3) = 1. (5°5) 
P2-P1 
Putti ‘ 0 Z_— 2 ’ an 
utting “= ap, an(P) + fatale) —dolp) (i = 1, 2), 
2 pP=p 
Po-P1 Pe-Py 
C = (%—%)/(P2—Py), J 





then the mean square error (M.S.E.) of L(o: d,,d,) is given to first order by 


1 
w.s.x.(L(a)|o) = >| (Dadt+ 2Prtedi dat Pate) +P) 14B,-B,) |, 67) 


and this is minimized, subject to the conditions of (5-4) and (5-5), by taking 


P11 92/% 7 — 2b, | ) 
(-1)'#14,D “7a(le— 4), | al ‘ t | (i = 1,2) 
= = r _ a = 2) > 
i 2;(Po— Py) |PaPs 12/Pi 2 2's | 
, = 1+B,—B, B,t,—B,t,—C | > (5°8) 


(a) D) (1+ B, — By) = (,D) 2, + (P2D) 22, 
D = (a)D) (By t, — Bat, —C) — (PD) 2, ty — (Pa D) Zale. 
Thus ¢,, ¢, and a,(p); then 
Ay = Ay By— 94%, Az = Ap By— Py. (5-9) 


The actual weights used are of course those obtained by replacing p, for p,; z, for z,; t; for 
t; in equations (5-8) and (5-9). The results simplify when d, = —co (i.e. when p, = 0), for 


1/s 





L(o: —o,d,) = > (x;—d,), (5-10) 


G+ %alpy) 7 








anc 


an 


If 


Vv 





in pw 
itage 
r for 


(5-6) 


‘5°7) 


5-9) 


, for 
or 


10) 





» 


Normal population parameters 371 


and M.S.E. (L(o: —00,d,)) = [1 + p2q.(1—B)/z3] (5-11) 
bia ii Nolte + 22/P2)" Pats wal 
2 
with B= ae . 
P2 Pp 


The computations involved in (5-8) and (5-9) are not worth the effort in terms of the very 
small efficiency increase over another, extremely simple estimator for 7 which is to be 
described in the next section. However, in the case of singly censored samples (d, = —0o or 
d, = +00) we are committed to the use of L(a: —0o,d,) unless we voluntarily censor the 
data below. Values of the mean square error of L(a7: —0o,d,) are to be found in the first 
column of Table 2. 


6. THE QUANTILE ESTIMATOR FOR 7 


Consider the function 


A(o) = a(p) (d,—d,), (6-1) 
where a(p) = a(p,, 2). We may write this as 
A(o) = ca(p) (t.—t) (6-2) 
and observe that A(o) will be unbiased for o if a(p) is such that 
6a(p) (ty—t,) = 1. (6-3) 
If we write a(p) = 1/(,-4,) (r+ 8), (6-4) 


then it is easily shown that the bias of A(c) is of order 1/n. Writing 





i t, —t,)% [2 4 22do t t 
a(p) = (-%)2-! 2— 4) Pidi_ *P ide Pelt + (4—t,)( Pith _ aPets) | 
p=p 


2n 23 242s 2 a 2 
(6-5) 
then the bias of A(c) is of order 1/n?. In either case 
A(c) 1 Pili 27192 | P2Qe 
ee | EES, SEED 5 SEES T+ OCR /00%). 6-6 
ver| o | nt2—h PL zy 22 rf ca i uP 


The value of var (A(v)/o) has been computed for several combinations of p, and p, and the 
asymptotic efficiency relative to @ is given in Table 2, where it will be seen that, at least 
in the range considered, the efficiency of this simple estimator is astonishingly high. 


7. A LINEAR ESTIMATOR FOR /4 SYMMETRICAL IN 2,,1, ---, Xs 
We take as estimator 
1 

L(y: dy, d,) = €o(P) ——, Bay +d e(P) + dea(p), (7-1) 
where e,(p) = €;(4,P2) (j = 0, 1,2) is a function of p, = r/n and p, = s/n. We may write 
(7-1) as 1 
Liq dys d) = Meg p) +e) +e4(0)) +9] (0) = Sue +hale)+hete)|, (72) 
so that 


E(L{w: dy, ds)|r, 8) = w(eo(p) + €x(p) + €a(P)) + o| testo) +ty¢4(p)—eo(p)>— “| . (73) 





24-2 
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If we take €o(P) +e,(P) + e2(p) = 1, (7-4) 
then L(y: d,,d,) will be unbiased to order 1/n if 
“ e Z_—z2 
£,€;(p) +ée,(p) —e 21-0. 7-5 
1€1(P) + f2€2(p) — eo( Po—Px (7-5) 
Writin 6, = 4 [¢ €,(p) + t2e2(p) )|. (7-6) 
g t= 9p, [2% P) + h2€a\P ot — o(P 


then with B,; and C defined as in (5-6), we find that L(y: d,,d,) has least mean square error 
when é,(p), €;() and e,(p) are determined from 








22 22t, to(t, —t,) 
-¢,K =2% rfa(la—h 1+B,—B,)+(B,t,—B,t,—C 
(~Via ee 2) + (Bit, —Bats—C) | 
P11 92/9; —&y — ab | 
x) P1P2de/P; —% —Zaty PL (ary 
(e)K) (B,t, — Bat, —C) = (0, K) zt, + (0,K) zote, 
K = (e,K) (1+ B,— By) — (0, K) z,— (0, K) 2p. J 
Thus é), 9, and @,;then ¢, = @)B,—0,2,, 2 = —€y)B,— 2p. (7-8) 
1+B, —B, 
Lastly m.s.E. (L(y: d,,d,)) = 7 | oa + 291929, 92+ 229263 + ea Ta age "|. (7-9) 
1 
1 t. 
If d, = —o, then L(y: —«,d 2S 2,4+— 4, | 7-10 
(u: 00,4) = Gas | Eat y (7-10) 


to%_ 25 


o2 
and M.S.E. (L(“: —00,d,)) = cath + alee al +n (t+ =)’ +4 (1-9-4) |. (7-11) 


Again we look at estimators which are nearly as efficient but more tractable. 


8. ALTERNATIVE ESTIMATORS FOR / OF HIGH EFFICIENCY 


We now propose three estimators for ~ which require very little labour in regard to their 
computation. Defining 


C = (%2—%)/(P2—Py), C = (2.—%)/(P2—Pr), (8-1) 


then an estimator for ~ will be 





1 (i a 
Ar(p: dy de —a(jt;24+ och), 8-2 
i(/#: d,d,) = i, +0 ae 1 (8-2) 

which we may write as 
Ayu: dy ds) = pt 2 7 (=. Su, +Ct :) (8:3) 
so that E[A,(u: dy, d,)|r,8)] = w+—— (t,0 -7,C) (8-4) 








a Tne eA 





7-4) 


7-5) 


7-6) 
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7:7) 
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(8-3) 


(8-4) 
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and therefore the bias is of order 7/n. With B, and B, defined as before it is easily shown that, 
to order 1/n? 


M.S.E. (A,(u: d,,d,)) = ——H, “ap th B, iki” te 2t, B,(t, B, “ae 
29 


+ (BP Pee + A +3,-25] _ (8) 
: 22 Ps-Py 
Alternatively we may use the estimator 
1 


dali dvd) = a 5 st, Bart Ody. (8:6) 


Its M.S.E. may be obtained from equation (8-5) by replacing t, by —#, in the numerator. 
The third and simplest estimator is 
t,d,—t,d 
A,(u: d,,d_) = a or (8-7) 
2h 
which we may write as A,(m#:d,,d,) = ~+o(t,t, —tt,)/(f.—4), (8-8) 


so that the bias is of order o/n after taking the expectation over r and s. It is easily shown 
that, to first order 


tz — 2t,t,2,2 + (t,2,)? 
M.S.E. (Ag(jt: dy, d»)) =~ ak 272)" Pid — 2 ae (4,2) Pals) (8-9) 





Which of the three estimators has the least mean-square error is governed by the prevailing 
values of p, and p,. We note that if p, = 0 we are committed to the use of A,(~: — 00, d,); 
similarly if p, = 1 we are committed to the use of A,(u:d,, —0o). Investigation over the 
whole region 0 < p, < p, < 1 indicates that 


A, is to be preferred if 0 < p, < py < 0-50 or 0-50 < p, < p, < 1, 
A, is to be preferred if O<p,< 050 and 050<p,<1-p,, 
A, is to be preferred if O0<p,<050 and l-p,<p,<l. 


Values of the asymptotic efficiency of A,, A, and A, are given in Table 3. 


9. SUMMARY 


It is seen from Table 2 that when the sample is censored above and below the estimator 
A(o: d,, d,) is of high asymptotic relative efficiency, at least in the range 


0-20 < py < py < 0°80. 


The estimator as defined in (6-2) and (6-4) is biased of order 1/n though this may be reduced 
to order 1/n? using (6-5) (and of course the bias may be reduced to any order in 1/n by a 
more accurate evaluation of the expected value of A(c: d,,d,) than was thought necessary 
for the purpose of this paper). 

Similarly, the estimators A; (w:d,,d,) (j = 1,2,3), may be used for doubly censored 
samples as estimators of ~. Which of these three estimators is to be preferred is governed 
by the prevailing values of p, and p, and will therefore be indicated by the experimental 
values of p, and pz. 

These estimators, simple in form and in evaluation, were suggested as an alternative to 
maximum likelihood estimation which necessitates the solution of two simultaneous 
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equations ((4-4) and (4°5)) by an iterative process (Cohen, 1957); or to the method wherein 
the type I censored sample is treated as though it were type II censored. Although this last 
method is probably acceptable for large samples, a few calculations for small samples seems 
to indicate that the bias will often be quite large, also the experimental variance is tedious 
to calculate since it is necessary to obtain the conditional variance in every situation 
1 <r <8 < nwiththe associated probability of (r,s) in order to obtain the overall variance. 
On the other hand the variances of A(c: d,,d,) and A;(w: d,,d,) are comparatively very easy 
to determine. 

In the case where the sample is singly censored, for example, censored above at d., the 
estimator L(u: —00,d,) is highly efficient. The estimator L(o: —00,d,) is not, however, 
especially attractive since the asymptotic efficiency can be low when p, > 0-50. It seems 
likely that in this case it would be advantageous to consider a quadratic estimator for o* 
rather than a linear estimator for oc. 

There is the possibility in finite samples that p, = 0 or p, = 1. In this event the limiting 
form as p, > 0 (or p, > 1) of L(u: d,,d,) or of L(o: d,,d,) should be used. There is consider- 
able simplification, for example if p, = 0 then B,, defined in (5-6), is replaced by zero and 
By by a(t, +22)/Po- 

When, simultaneously, p, = 0 and p, = 1, we conveniently use the sample mean and 
standard deviation as estimators of w and o, respectively. It should be remembered that if 
d, and d, are finite, the probability that p, = 0 or p, = 1 tends to zero as n tends to infinity. 
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eems Table 1. The asymptotic values of var (ji), var (@) and cov (fi; &) in units of o?/n 
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ance. | Pe \ 
“easy 0-20 57804 
35375 
3-7173 
2» the 0-25 4-0238 ‘11-7985 
ever, 27605 18-8166 
ecems 2-5493 13-7156 
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aan 1-6155 4-0942 5-7376 9-0431 19-0122 
aa 1-0259 1-8973 2-3767 3-2819 58814 
spt 0-45 1-7128 1-8312 1-8653 1-9098 1-9811 2-1583 
nity. 1-4071 3-1544 4-1317 57715 9-0737 19-0370 
0-7853 1-2323 1-4147 1-6848 2-1695 3-4982 
0-50 15171 1-5565 1-5615 1-5654 1-5682 1-5699 1-5706 
1-2414 2-5378 3-1821 41559 57921 9-0897 19-0464 
0-6052 0-8208 0-8777 0-9389 1-0057 1-0791 1-1607 
mples 0-55 1-3761 1-3859 1-3859 1-3866 1-3901 1-4011 1-4351 
1-1066 2-1039 2-5582 31993 41696 58016 9-0942 
rom a 0-4674 0-5488 0-5467 0-5255 0-4683 0-3344 Zero 
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al 0-9948 1-7830 2-1186 25698 3-2075 4-1740 
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intries for (p;, P,) and (q, q,) are identical. 
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Table 2. (a) Asymptotic efficiency relative to & of L(a: — 00, d,). 
(b) Asymptotic efficiency relative to & of A(o: d,, dz) 


(a) (0) 











0-00 0-20 0-25 0-30 0-35 0-40 
0-9256 

9141 0-9992 

*9022 *9962 0-9994 

*8944 *9927 -9980 0-9997 

*8763 -9906 -9966 -9990 0-9999 

-8619 *9885 *9954 *9985 -9997 1-0000 

*8460 -9869 9946 -9980 *9995 0-9999 
0-8264 0-9858 0-9940 0-9977 0-9993 0-9998 

-8086 9851 -9936 *9975 -9991 *9997 

*7858 *9845 -9930 -9969 *9985 

*7591 -9833 -9916 9953 

*7271 *9805 *9882 

*6873 -9739 


Table 3. (a) Asymptotic efficiency relative to fi of L(u: — 00, dz). 
(b) Asymptotic efficiency relative to fi of Az(m: d,, de) 


(2) (6) 
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-9896 +8923 -9189 


-9929 +8589 


0-45 


0-9999 


0-9992 





00 


99 















Normal population parameters 


Table 3. (c) Asymptotic efficiency of A,(u: d,,d_) relative to u 
in the region 0-20 < p, < 0-50 < py < 0-80 


\ Pi 
oS 0-20 0-25 0-30 0-35 0-40 0-45 
Py \ 
0-55 0-9987 0-9998 0-9989 09954 0-9808 0-9738 
-60 -9990 -9967 -9884 -9766 0-9587 
*65 *9937 *9847 -9708 *9513 
-70 *9845 -9696 *9493 
75 *9722 °9517 
80 *9573 


Note regarding Tables 2 and 3 


In Tables 2 (6) and 3 (5), entries for (p,, 7.) and (q2, 91) are identical so that, for example, the efficiency 
of A(o: d,,d,) for the case p, = 0-60, p, = 0-75 is 0-9966. This is also true of Table 1. 

In Table 3(c), the efficiency of A,(u:d,,d,) for (p,, p,) is the same as that of A,(u: —d,,—d,) for 
(de, 91) 80 that, for example, the efficiency of A,(u: d,,d,) for the case p, = 0-40, p, = 0-65 is 0-9766. 

In Tables 2 (a) and 3(a), the entry for (0-00, p,) is the same as that for (q., 1-00) so that, for example, 
the efficiency of L(y: d,,—00) when p, = 0-40 is 0-9764. 
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Some tests for outlierst 


By C. P. QUESENBERRY{ anp H. A. DAVID 


Virginia Polytechnic Institute 


1. INTRODUCTION 


A recent issue of T'echnometrics (Vol. 2, 1960, pp. 123-66) contained two papers and an 
accompanying discussion on the rejection and location of outlying observations. It was 
emphasized, particularly in the discussion, that there might be several ways of approaching 
the problem, which depended to a large extent on the object in view. One might, for in- 
stance, be primarily interested in pruning the observations in order to secure a more 
accurate analysis of what was left, e.g. to obtain the most reliable estimate of a mean. Or 
one might be particularly interested in identifying which were the genuinely exceptional 
observations, in order to create a new insight into the phenomena under study. In the first 
case the criterion of what was best might be the effect on the standard error of estimation, 
in the second case the risk of wrongly deciding whether an observation was exceptional 
or not. The procedures discussed in the following paper start from the basis of risks of 
misclassification rather than of estimation errors. 

The particular problem which we take is that of detecting outlying observations when in 
addition to the normal sample x,, 2%, ...,2,, at hand an independent mean-square estimate 
s? of the common variance a? is available. The same situation has been considered by Nair 
(1948). To test for one outlier at a specified end of the sample he proposes using the ratio of 
the extreme deviate from the sample mean tos,. For two-sided testing the extreme absolute 
deviate from the sample mean divided by s, has been proposed by Halperin, Greenhouse, 
Cornfield & Zalokar (1955). 

The two statistics mentioned above do not make use of the variance estimate s* from the 
sample and for this reason do not possess certain desirable optimal properties. We shall 
propose statistics with the same numerators as those above but with s, in the denominators 
replaced by the pooled estimate 


s* = {[(n—1) 8? + vs?]/(n + v—1)}}. 


Kud6é (1956) has shown that among a suitably restricted class of tests these statistics 
maximize the probability of rejecting the null hypothesis of homogeneity of the sample 
in the presence of a single outlier. 

We shall develop a method for computing percentage points of these statistics and present 
the computed tables. These tables are also immediately applicable to the problem of slippage 
of means in normal samples. Two examples illustrate the procedures. 


t This research supported in part by a National Science Foundation Fellowship and in part by the 
Office of Ordnance Research, U.S. Army. 
¢ Now at Montana State College. 
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2. NOTATION AND DEFINITIONS 


Let x, 2%, ...,%, denote the sample in the order drawn. Then define 


s? is an independent mean-square estimate of variance, o?, with v degrees of freedom, 


S?2 = (n—1)s? + vs?, 





b, = (u,—)/S, (2-1) 
b= maxb, = Zama? (2-2) 
b* = max |6,|. (2-3) 


Then } and 5* are essentially the one-sided and two-sided statistics, respectively, dis- 
cussed in §1. It should be noted that S? has not been divided by its degrees of freedom. 

The special case vy = 0 has been treated by Pearson & Chandrasekhar (1936), Grubbs 
(1950) and Borenius (1958). 


3. DISTRIBUTION THEORY 


For the work in later sections the distribution of 5; and the joint distribution of b; and 
b; are weeded. We shall now obtain these distributions, taking for definiteness 1 = 1 and 
j = 2. An extremely complicated derivation of essentially the same distributions has been 
given by Doornbos, Kesten & Prins (1956) in an article concerned with slippage tests. 

As is well known, (n — 1) s* may be decomposed into two independent components 


n —\3 
act (%—Z)* + Xin_2) 0%, 


which are distributed respectively as x20? with 1 and n—2 degrees of freedom. With the 
same notation we have, therefore 

S? = 5 (a, —%)® + Xin 4-2) 0. (3-1) 
Then b, may be written as 


_ (_%_ Xinty_no")\ + 
= (55+ 7 


ae h\ 3 —2\-3 
sd tn+v—-2) 


where f;,, .,,_») denotes a t-variate with n + v — 2 degrees of freedom. It follows that the density 
function of b, is 





_({ 2 \* Thkm+v-1)] wer n—1\t n—1\t 
f(bs) = (5) Ja T[4(n+v—2)] RP POG, «oe (*>) nays >) : 
(3-2) 


This generalization of a result due to Thompson (1935) has also recently been pointed out 
by Anscombe (1960). 
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Continuing the decomposition of (3-1) one step further we have 


n = n—1 te 
= oe (% —Z)? + —5 (2-2 + Xlntv—s) 7 
th sai a 
wit z= ni Xj. 
Let by = (#.—2")/S", 
, n a 
where 8" ax trees, (x, —Z)?. 


Clearly, the distribution of b; is of the form (3-2) with n replaced by n—1. Moreover, b, is 
independent of b,. To see this suppose s? is based on a random sample of size v + 1, with mean 
Z,, taken from a N(,,0*) parent. This assumption is unnecessarily restrictive but does not 
essentially affect the argument. Then Z, %, and S? are complete sufficient statistics for w, “, 
and a. Since the distribution of 6, does not involve these parameters, b is independent 
of the joint distribution of Z, Z, and S (Basu, 1955). Also b; does not involve x, so that it 
must be independent of b,. 
The joint density function of b, and bj is therefore 


, n \* (n+v—3 n An+—4) n—1,,.\ intr) 
(2) CE staf (a 
I Hg Lely m—2\t 4, . (n—2\3 

7. wee - (75) <4< 3) - 


Since en, by+b,/(m—1)_ 











a-2° a+3 a-$" 








(3-3) 


b _ - ote A(n-+v—5) 
we obtain /(b,,0,) = (<5) S+r=3 (1 n—1 p2— 26,6, n-1 ) 
2 











over the ellipse 
=0 elsewhere. 


Moments of b and b* 


Since the distributions of b and 6* do not involve y, ,, and o? it follows as above that b 
and b* are distributed independently of S. This result is well known for the special case 
v = 0 and may indeed be proved in a similar fashion, as was pointed out to us by Dr G. E. P. 
Box. As a consequence of this independence the moments (about zero) of b and b* are the 
ratios of the moments of their respective numerators and denominators. Thus we have, 


for example Zs 
. é Tmax — é 


In this case (but not so readily for b*) the right-hand side can be evaluated numerically 
since the cumulants of 2,,,,—% are related to the tabulated cumulants of the extreme 
Rub n, 1954) by equations which for ~ = 0, 7 = 1 become (McKay, 1935) 

K;(®max —%)= K,(®max) (r = 1,3, 4,5,...), 


Ko(2max —Z) = Ko(®max) aa I/n. 
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The distribution of b may therefore be approximated by a Pearson-type curve. However, 
for the purpose of obtaining upper percentage points the approach of the following section, 
applicable to both 6 and b*, is preferable. 


4. THE COMPUTATIONAL PROCEDURES 
(a) One-sided case, b 
We now consider a procedure for computing significance points of b defined by (2-2). 


For a given value of n and v let D, be the required « significance point of b. By Bonferroni’s 
inequalities (cf. David, 1956) we have for any D 


n Pr (b; > D)— (3) Bec b; > D,b; > D) < Pr(b > D) < nPr(b; > D). (4-1) 


For sufficiently large D the right side serves as a first approximation to Pr(b > D) and the 
left side as a second approximation. If the first approximation is set equal to «, then the 


resulting equation, i.e. Pr (b, > D) = ajn 


can be solved for a value D,, which is an upper bound of the value D, sought. From (3-2) 
this equation can be written as 


(n—1y/n}4 n 4(n+v—4) 
an = c,| [} - | db,, 
Dy n—1 


_[_2_}PR@+y-0) 
_ ana T[Mn+y—2))° 





where 





The following equivalent equation is more convenient to work with: 


4(n+v--4) 
B, -{" h-* _-—— i” | dx 


(—1)'[n+v—4][n+v—6]...[n+v— ar + 1)w DTH 





= 9 
ms +3" rl 2(n— ly’ (2r+1) visit 
—ain 
where B, = ae ° 


By transposing the second term on the right to the left of (4-2) the equation can beidentified 


with 
Dyin T h(D, i-1), 
so that Newton’s iterative formula 
h(D,, i-1) 
D,; = Dy ia WD...) 
can be used to solve for D,. Note also 


‘i n 4(n+v—4) 


The initial value of D, used to start the iteration procedure was D, , = B,, for B, as given 
above. 
While D, is an upper bound for D,, a lower bound by (4:1) satisfies 


nPr(b; > D;)— (3) Pr (b, > Dab, > Dy) = a. 
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A first approximation D, , to D, is, therefore, given by 
nPr(b; > Doo) = a+ (3) Pr (b; > D,,6; > D,). (4-3) 


On replacing D, in (4-3) by D, 9 a second approximation D, , is obtained. The process can 
be continued until D,,,, and D,, agree to three decimal places. In the present case D, , 
was found to be sufficiently accurate in all but a few cases. 

The second term on the right side of equation (4-3) is evaluated by numerical integration. 
The joint density f(b,, 5.) given by (3-3) is integrated over the region for which b, > D, and 
b, > D,. This region is shaded in Fig. 1. This numerical integration was performed on an 
I.B.M. 650 computer. The numerical method used is equivalent to fitting an increasing 
number of planes to the density surface until the desired accuracy is achieved. 











Fig. 1. Positive region of f(b;, b;). 


An examination of Fig. 1 shows that if D, > [(n—2)/2n]? then the second term on the 
right side of (4:3) is zero. Then d, = D, = D, is the exact percentage point of b. This is 
important in that it allows the exact calculation of a number of percentage points for lower 
values of n and p. 

The lower and upper bounds for the percentage points were found to agree so well for 
the values of « considered here (0-05, 0-01) that only one value had to be tabulated. Tables 
1 and 2 give the 5 and 1 % points, respectively, for selected values of v and n. 

It will be noted that the entries in the row v = 0, when divided by ,/n, correspond to those 
given by Chandrasekhar & Pearson (1936, p. 318) for the upper 5 and 1 % points of r. 


(b) The two-sided case, b* 


Essentially the same procedure is used to obtain the significance points of b* as for b. 
The Bonferroni inequalities in this case give 


nPr (|b;| > D)-(5) Pr (|b;| > D,|by| > D) < Pr(b* > D) < mPr(|b;| > D). (4-4) 
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From the symmetry of f(b;) we have 
Pr (|b;| > D) = 2Pr(b, > D). 


Let D* be the desired significance point of b*. Then an upper bound D¥ can be obtained 
from (4-2) by replacing a by $a in B, and solving. A first approximation D¥, to a lower 
bound D¥ on D* is given by 


n Pr (b; > D¥y) = 5[e+ (:) Pr (|b,;| > Df, |b;| > ps | ‘ (4:5) 


A second approximation D¥, can be obtained by replacing Df by D¥, in (4-5), ete. The 
second term on the right of (4-5) is evaluated this time by integrating f(b,, b,) over the area 


in each quadrant where |b;| > Dj and |b;| > Dj. The bounds on D* do not agree as well as 


for the one-sided case. Tables 3 and 4 give bounds for D¥ for « = 0-05 and a = 0-01, respec- 
tively. When the bounds agree to three places only one value is tabulated. 


5. THE SLIPPAGE PROBLEM 


The statistics b and b* are useful in treating the slippage problem for normal popula- 
tions. Here we have the sample 





Uy yg +» Uy, 
Veg aq ees | Tay, . (5-1) 
od 3 ed: Md 2 


We wish to test the hypothesis that the entire array is from a common normal parent 
against the alternative that the 7th sample (2;;, 7;9, ..., Xj») is from a normal parent with a 
different mean, where i is unspecified. 

Let , 's 


and s? be an independent mean-square estimate of error with ¢ degrees of freedom. 
An important special case is that of all equal subsample sizes, i.e. n, = n, = ... = N, = M. 
For this special case the statistics 


max > — ) 5:2 
i S; at 
M\X;,—2 
and max Sd . (5-3) 
i 8; 
km 
where Si= Y DY (x,,—%)? + te}, 


t=1j=1 
are distributed as b and b*, respectively. The significance points of these statistics are 
obtained from the tables of b and b* with the parameters n and v of the tables replaced by 
n =k and v = k(m—1)+t. These slippage tests possess the same desirable properties as 
do outlier tests based on b and b* (see Paulson, 1952). 

If the same sizes n; are not all equal but are approximately so the tables of b and b* can 
be used to obtain approximate tests. Put 


z, N= >. (5-4) 
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t — — 
Then the statistics nie (4%, —Z, (5-5) 
i 8, 
en sian ee 8 (5-6) 
i 8, 


give approximate tests based on the tables of b and b*. The parameters n and v of the tables 
are here n = kand v = N+t—k. 


6. EXAMPLES 
We now give two examples to illustrate the use of the tables. 


Example 1 


Squibs are small devices for igniting the rocket motors of missiles. Watertightness and 
shock resistance are important characteristics of squibs. In order to study these character- 
istics of a large batch a random sample of size 48 was drawn. The sample was randomly 
subdivided into 3 equal groups. The first group was used as a control unit and received no 
treatment, the second group was submerged in water and the third group was dropped from 
a fixed height. Each squib in the entire sample was tested by having a current of 5 amperes 
passed through it and its time to failure recorded. 


Table 6-1 
Control (,,;) Watertightness (2,;) Shock (23,) 
0-38 0-51 0-53 0:39 0-51 0-35 
26 55 35 74 63 41 
41 53 38 +32 46 49 
33 41 45 “74 47 40 
33 47 1-09 -48 42 58 
37 49 0-46 *37 45 46 
54 42 +57 *52 41 38 
76 34 47 44 39 48 
(Data furnished by Ordnance Missile Laboratories, ARGMA, AOMC, Redstone Arsenal, Alabama.) 
xax,; = 7:10, =x; = 8-30, Lvs, = 7°29, 
Z, = 0-4438, Z_ = 05188, Z, = 0-4556, 
Lat, = 33686, Lat, = 48768, Lai, = 3-4021, 
(Zar,;)? (2a9;)? (Zarg;)? 
ae = Be ——— = 4-3056, ——— = 3-3215, 
16 3-1506, 16 4-3056 16 
S'S(1) = 0-2180, SS(2) = 0-5712, SS(3) = 0-0806. 


From investigation of a large series of similar data it has been found that these delay 
times are approximately normally distributed, but that for reasons not fully understood, 
occasionally extremely large delay times occur. Because of this an outliers test was used on 
each subgroup to pick out such divergent observations. The variance of the bulk of normal 
observations was assumed to be constant throughout the experiment. The data are given in 
Table 6-1. 

First, we test each of the subgroups for outlying observations, in order to avoid such 
observations vitiating comparisons among treatments. For each test the variance estimates 
from the two remaining subgroups are used as an independent estimate of error. For the 
control group we have from (2-1) 


b= 


0-76 0-4438 
0-8698 


= 0-3390. 


Biom. 48 
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Table 1 gives the 5 % point for nm = 16 and v = 30 as approximately b (0-05; 16, 30) = 0-384, 
so the above value does not attain significance. 
For the watertightness group 
_ 1-09—0-5188 


°°“ 


a value which Table 2 shows to be clearly significant at the 0-01 level. The observation 
1-09 is therefore discarded. 

Now, since this observation was unduly inflating the variance estimate for the first test, 
we shall recompute that test with the observation omitted from the variance estimate. 
The test statistic becomes 
_ 0-76 — 04438 


ji ace ay ae 
b= —oaer7 — = 04878, 


and is to be compared with b(0-05; 16, 29). 

Table 1 gives b(0-05; 15, 24) = 0-413 and increasing either n or v tends to decrease the 
percentage point, so the above statistic is significant at the 0-05 level but not at the 0-01 
level. The observation 0-76 is omitted from the control group, the sum of squares based on 
the remaining 15 observations being 0-1113. 

For testing the 3rd (shock) group 6 = 0-2707. This is not significant at the 0-05 level. 
Further tests in the subsamples lead to no more discarded observations. 

The purpose of the experiment is to test the significance of the water and shock treatments. 
We are interested in testing the hypothesis that either one or both treatments increased the 
mean delay time. A two-sided test is appropriate here. For the two-sided test will have a 
probability of rejection higher than for the null situation under any alternative except 
when one treatment effect is exactly twice the other (both non-zero). 

Since the subsample sizes are large and nearly equal, the approximate test discussed in 
§5 should give accurate results. Here n, = n, = 15 and n, = 16. The weighted means are 
(NX, = 1-637, ./n,%, = 1-862, ./ng%, = 1-822 and %,, = 1-774. Then (5-6) gives 

1-774 — 1-637 


— oo 





Table 3 gives b(0-05; 3,40) > 0-283 so the value 0-213 is not significant at the 0-05 level. 
We conclude that the treatment effects if present, are only small. 


Example 2 


A sample of six observations was drawn from a table of random normal numbers and a 
randomly selected observation was increased by two standard deviations. The observations 
obtained were 265, 223, 291, 105, 43 and 477. A sample of six observations was drawn from 
a table with the same variance but with a different mean to give an independent estimate 
of variance. These observations were 171, 111, 185, 271, 68 and 217. The mean and sum of 
squares about the mean for the first set of observations are 234 and 116,504 respectively. 
The sum of squares about the mean for the second set is 26,519. So the one-sided test 
statistic is from (2-1) 

_ 477-234 


~ 143,023 — 





384, 


tion 


test, 
ate. 


the 
0-01 
d on 


vel. 


nts. 
l the 
ve a 
cept 


d in 
3 are 


evel. 


nd a 
tions 
from 
mate 
m of 
vely. 
test 


Some tests for outliers 387 


Table 1 gives 6(0-05; 6,5) = 0-638, so that the observation 477 is rejected at the 0-05 level. 
Table 3 gives b*(0-05; 6,5) = 0-681, so that the observation 477 is not rejected at the 0-05 
level by the two-sided test.* 
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APPENDIX 
The tables 


Tables 1 and 2 give the 5 and 1% points, respectively, of b [see (2-1)]. The 1% points are correct 
to 1 unit in the fourth place. Only a few values for n and v large are questionable at all in the last digit. 
The 5% points of 6 are correct to three places except for a few large values of n and v. For n = 20 
and v = 50 the value given may be as much as 2 units too large in the third place. No other values in 
the 5% table are incorrect by more than one unit in the third place. 

Tables 3 and 4 give lower and upper bounds for the percentage points of b* [see (2-2)]. For each 
combination of parameter values, lower and upper bounds are given except when these bounds agree 
to three decimal places and then only one value is given. 

When an asterisk (*) replaces the lower bound, the correction term in equation (4-5) is zero, i.e. 
(upper—lower) = 0. When a horizontal bar (—) replaces the lower bound, the correction term in (4:5) 
is not zero but so small that the lower agrees with the upper bound to three places, i.e. (upper— 
lower) < 0-001. 


{* This example was introduced simply to illustrate the method of calculation of the statistic and 
the use of the tables; presumably when dealing with a real problem it would be clear to the statis- 
tician whether a single or two-sided test was the more appropriate. Ep.] 
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Fok 
6 0-8154 
1 0-789 
2 ‘741 
3 *692 
4 -648 
5 -610 
6 0-577 
7 -549 
8 *524 
9 +502 
10 -483 
12 0-450 
15 411 
20 +363 
24 +335 
30 +303 
40 -266 
50 239 
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+5489 
+4962 
+4633 
+4240 
+3763 
+3416 


Table 1. Upper 5% points of (%m.,—%)/S 


6 


5566 
5055 
+4732 
-4346 
-3869 
+3519 


7 


7 
0-8566 


0-8263 
*7971 
-7698 
+7444 
*7207 


0-6987 
-6786 
*6599 
-6424 
-6263 


0-5971 
-5600 
+5106 
-4792 
-4412 
+3940 
*3591 


0-768 


*305 


0-8394 


0-8104 
*7833 
“7579 
‘7341 
-7120 


0-6918 
-6729 
*6554 
*6389 
*6237 


0-5962 
+5607 
+5132 
+4826 
-4455 
-3990 
+3642 


9 
0-746 


0-714 
686 
661 
*638 
617 


0-598 
+580 
+564 
-550 
-536 


0-511 
-480 
+438 
411 
+379 
+339 
*310 


9 
0-8211 


0-7942 
-7688 
*7450 
“7229 
-7026 


0-6837 
-6659 
6495 
*6341 
-6198 


0-5935 
*5597 
-5140 
-4844 
-4480 
-4023 
+3681 


10 


0-725 


0-697 
672 
648 
*627 
-608 


0-591 
-574 
-559 
-546 
+533 


0-509 
-479 
+439 
413 
+382 
+342 
+313 


. Upper 1% points of (amax—%)/S 


10 
0-8032 


0-7780 
*7541 
+7320 
-7116 
-6926 


0-6748 
*6581 
*6428 
*6284 
-6148 


0-5899 
-5576 
+5136 
-4850 
-4496 
-4047 
3711 


0-7687 


0-7465 
+7260 
-7070 
-6890 
*6724 


0-6548 
*6422 
*6286 
*6156 
6031 


0-5808 
-5513 
-5104 
-4837 
-4501 
4071 
+3744 


n v+1 
Note: S? = }} (x,—%)?+ > (y;—9)*, with y,; independent of 2;. 
1 1 


0-644 


0-626 


566 


0-553 


-510 


0-492 


+322 


0-7228 


0-7048 
-6879 
-6723 
-6576 
*6438 


0-6306 
-6182 
-6066 
+5956 
-5851 


0-5654 
+5393 
-5031 
*4785 
-4479 
-4074 
+3766 
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40 


50 





20 
0-586 


0-574 
562 
+550 
540 
+530 


0-520 
“511 
502 
+494 
486 


0-472 
+452 
+424 
“405 
381 
+349 
+323 
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40 


50 


Table 3. Bounds for upper 5% points of max |x,—Z|/S 


5 


0-857 


* 


0-820 
* 
0-782 
* 


0-746 
* 


0-713 
+: 


0-384 


0-657 


0-633 
0-611 
-610 
0-592 
-590 
0-573 
*572 
0°542 
+539 
0-502 
-499 
0-452 
447 
0-421 
-416 
0-384 
+379 
0-340 
+334 
0-308 
+302 


6 


0-844 


*x 


0-807 
+ 


0-771 
* 
0-738 
* 
0-708 
* 


0-681 


0-657 


0-635 


0-614 
0-596 
-595 
0-579 
-578 
0-549 
+547 
0-511 
-509 
0-462 
-459 
0-432 
*428 
0-395 
391 
0-351 
+346 
0-319 
+314 


7 


0-825 
* 


0-789 
* 
0:757 
* 


0-727 
* 


0-670 


0-675 


0-652 


0-632 


0-613 
0-596 
-595 
0-580 
-579 
0-551 
+550 
0-515 
514 
0-468 
-466 
0-438 
+436 
0-403 
*399 
0-359 
+355 
0-326 
+323 
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0-804 
* 


0-771 
* 


0-741 
* 
0-714 
* 


0-689 


0-667 


0-646 


0-627 


0-609 


0-593 
0-579 
*578 
0-551 
+550 
0-517 
-516 
0-471 
-469 
0-442 
+440 
0-408 
-405 
0-364 
+361 
0-332 
+328 


9 
0-783 
* 
0-753 
* 
0-726 
* 


0-701 


0-678 


0-658 


0-638 


0-621 


0-604 


0-550 
+549 
0-517 
-516 
0-473 
471 
0-445 
443 
0-411 
-408 
0-368 
+365 
0-336 
+333 


10 


0-763 
* 
0-736 
« 


0-711 
* 


0-688 


0-667 


0-648 


0-630 


0-614 


0-598 


0-584 


0-571 
0-547 
-546 
0-515 
-514 
0-473 
*472 
0-446 
+444 
0-413 
“411 
0-371 
+368 
0-339 
+336 


n v+1 
Note: S? = }}(a,—Z)?+ > (y;—9)*, with y, independent of a;. 
1 1 


* indicates (upper—lower bound) = 0. 
— indicates (upper—lower bound) < 0-001. 































3 4 5 
oh ’ 
0 


0-8165 0-864 0-881 


* * * 
1 0-814 0-851 0-862 
- * * 


2 0-800 0-830 0-837 
* * * 

3 0-778 0-805 0-809 
* * * 


4 0-751 0-777 0-782 
* * * 

5 0-724 0-750 0-757 
* * 


6 0-698 0-725 0-733 


* * 


7 0-673 0-702 0-711 


8 0-651 0-680 0-690 


9 0-629 0-659 0-671 


10 0-610 0-640 0-653 


609 ae — 
12 0576 0-607 0-621 
573 pst wish 
15 0533 0-564 0-580 
529 wo od 
20 0-478 0-509 «0526 
473° = 508-525 
24 40-444. «0-475 «(0-492 
439 +473 -491 
30 0-405 = 0-484Ss«0-451 
399 -432 = -450 
40 0357 0-385 0-401 
351 382 +399 
50 0323 «0-349 0865 
‘317-346-3863 


6 


0-755 
* 
0-733 
* 
0-712 
ok 


0-692 
0-674 


0-657 


0-626 


0-587 


0-534 


0-501 
0-461 
-460 
0-411 
-410 
0-375 
+373 


* indicates (upper-lower bound) = 0. 
— indicates (upper—lower bound) < 0-001. 


7 
0-874 
* 


0-847 


a 


0-821 
* 
0-796 
* 
0-772 
* 
0-749 
* 


0-728 


* 


0-708 


0-690 


0-673 


0-657 


0-628 


0-590 


0-539 


0-507 
0-468 
-467 
0-418 
“417 
0-382 
381 
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9 
0-843 
* 


0-819 


* 


0-795 
* 
0-772 
* 


0-751 


* 


0-731 
* 


0-713 


* 


0-695 


0-679 


0-664 


0-649 


0-623 


0-588 
0-542 
0-511 
0-474 
473 
0-426 
-426 
0-390 
*389 


Table 4. Bounds for upper 1% points of max |x;—Z|/S 


10 


12 
0-794 
* 
0-773 
* 
0-753 
Oo” 
0-735 
* 


0-717 
* 


0-701 


0-685 


0-671 


0-657 


0-644 


0-632 


0-609 


0-579 


0-537 


0-509 
0-475 
474 
0-430 
*429 
0-396 
+395 


n v+1 
Note: S?= ¥}(x,—%)*+ ¥ (y,;—7)*, with y; independent of 2;,. 
1 1 
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The ergodic behaviour of random waiks 


By J. F. C. KINGMAN 
Statistical Laboratory, University of Cambridge 


1. Many stochastic processes occurring in practice may be formulated as Markov chains 
with an enumerable state space. It is then important to know whether or not the chain is 
ergodic, i.e. whether or not a stationary distribution exists. For particular problems this 
has often been determined by complex and ingenious methods, a good example being the 
analysis by Kiefer & Wolfowitz (1955) of the many-server queue. However, Foster (1953) 
has given a general criterion for a chain to be ergodic, and the purpose of this paper is to 
examine the way in which his result may be applied to particular processes. By way of 
example, the technique is applied to two important problems in queueing theory. The 
results obtained also have consequences in the theory of multi-dimensional random walks, 
in which context ergodicity simply means that the particle does not escape to infinity. 

The processes with which we will be concerned are those which may be formulated as 
Markov chains on the state space of vectors 


X = (%,...,%n), 


where the x; are integers. We shall assume that the chain is irreducible and aperiodic, and 
we shall denote the probability of a transition from x to x+y by p(x, y). Then we have the 
following result, due to Foster (1953, Theorems 2 and 3): 

The chain is ergodic if and only if there exists a positive function (x) such that, for all x 


U(x, y) P(X +Y) <0 (1) 
and such that, for all but a finite number of x, 
UP(X, y) A(x+y) < o(x)-1. (2) 


(Foster states his theorem for the case where (2) holds for all x +0. His proof, however, 
goes through unchanged in this slightly more general case.) 

If we have a particular chain, and we wish to prove it ergodic, it is necessary to construct 
a function $(x) satisfying (1) and (2). In the simple cases considered by Foster, a simple 
linear function of x sufficed, but, as will appear later, more complicated functions are often 
necessary. Our object is to give a method for constructing ¢. This will be done in a heuristic 
manner, and it should be emphasized that it is necessary, having constructed ¢, to verify 
that it does indeed satisfy Foster’s criterion. There are, however, certain special cases in 
which a more precise analysis can be given. 

The technique to be described is most fruitful when the process is a random walk in some 
region, possibly with complicated boundary conditions. It is, however, by no means 
restricted to this case. 


2. In random walks, a common situation will be that the length of a step is bounded, 
so that, for any x, p(x, y) = 0 for all but finitely many y. In this case (1) will be trivially 
satisfied. To examine (2), consider the effect of approximating ¢(x+y) by 


(x+y) = ¢(x)+y.grad (x). (3) 
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Here we are regarding ¢ as a (differentiable) function of the n real variables x,, and, as usual, 
grad ¢ is the vector whose ith component is 0¢/0z;. Then (2) becomes 


p..grad o(x) < —1, (4) 


where p(x) = Syp(x, y) (5) 
y 


will be called the drift vector at x. 

If we now look at the level surfaces of d(x), we see that, in order that ¢(x) should always 
be positive and that |grad ¢| be bounded away from zero, each level surface must enclose 
a region, which will usually be bounded, in which ¢ takes smaller values than it does on 
the surface. In order that w.grad ¢ be negative, u(x) must point into the region enclosed 
by the level surface through x. 

Hence, given the configuration of drift vectors, we have to construct a family of surfaces 
in the region of the walk which are nested inside one another, and such that w(x) points 
into the interior of the surface through x. We then choose ¢(x) as a suitable monotonic 
labelling of the surfaces. This is a purely geometrical problem, and whether or not it is 
possible depends on the configuration of the drift vectors. 

By looking at these vectors, we can gain an idea as to whether or not the chain is likely 
to be ergodic. For example, if n = 1, and the walk is restricted to x, > 0, we will expect 
the chain to be ergodic if, for all x, ~,(x) < 0. That this is in fact so, has been shown, in a 
particular case, by Lindley (1952). This method has, however, relatively little interest in 
the case n = 1, when the strong law of large numbers can be readily invoked. 


3. A less trivial illustration is that of a random walk in the first quadrant of the plane. 
For simplicity we assume that the steps are of unit length, and have the same distributions 
at all points of the interior x, > 0, x, > 0. The boundary conditions are arbitrary except 
that they are constant along each straight segment of the boundary. (It is straightforward 
to extend the analysis to a more general situation.) 

Under these assumptions we have 


w(xX)=p (%,> 0, x, > 0) 
=p’ (x, > 0, x, = 0) 
=" (a, =0, a > 0), 


where p, w’, uw” are vectors, with 
Mg 29, py 2 9. 


Now consider the level-curves of ¢(x). These must meet x, = 0 at such an angle that 
v, »’ both point inside them. This is only possible if the directions of p, p’ satisfy a certain 
condition. A similar consideration on x, = 0 gives a condition between p and pw”. We dis- 
tinguish three cases. 


(i) #y < 0, Wy < 0. 
In this case the configuration of the level-curves will be as shown in Fig. 1, and the 
condition for the construction to be possible is clearly 


—-y > Ms, 
—He2 Pe 
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The form of the curves suggests that ¢(x) should be a quadratic function of x, and, in fact, 
it is not difficult to show that $(x) = ax3— 2he, 2, +078 
satisfies the conditions of Foster’s result, so long as a, b, c are chosen so that 


ee eepee  y and Me 9 =e 
My © —Ps A GC fh 
Hence this simple geometrical argument gives in a rigorous manner, a sufficient condition 
for the walk to be ergodic. We note that the fact that grad ¢ > 0 as x > 0 does not affect 


the argument, as equation (2) may fail to hold at x = 0. 


¢ = constant 
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(ii) 4, < 0, Wy < 0. 

From Fig. 2 the condition is easily seen to be 
p .. Wee ge 2 
“kh ff; 


We see that, in general, a quadratic function will not suffice for 4(x). 





(iii) wy < 0, Wy > 0. 

In this case it is clear that, however we draw the level-curves, p» will always point away 
from their interior. 

We may summarize the results in the following statement, which is broadly true for 
other multi-dimensional random walks (although it is not difficult to construct pathological 
counter-examples): 

If the drift vector in the interior of the walk points to some part of the boundary, and if, 
when we consider the drift vectors off that part, they either converge (in the sense that any 
two vectors meet in the interior) or point towards the other part of the boundary, then the 
walk is ergodic. 

This has the important consequence that, unlike the situation in the one-dimensional 
walk, the criterion for ergodicity in several dimensions depends critically on the boundary 
conditions. For instance, in the case dealt with above, even the conditions p, < 0, MW, < 0 
were not enough to ensure ergodicity. 


4. Inthe above example, it was possible to take the level-surfaces of 6(x) to be geometric- 
ally similar, so that ¢ could be taken as a homogeneous function of x. In this case the 
status of the linear approximation (2) is not merely that of a heuristic convenience. In fact, 
if ¢ is homogeneous of degree k, and if x = |x|, then 


d(x+y) =2*¢ (7+) 


akb(x/x + y/2x) 
ak[(x/ax) + (y/a). grad 6(x/x) + O(1/2*)] 
= $(x)+y.grad f(x) {1+O(1/z)}. 


Hence, if p.grad d(x) < —1, then, for x sufficiently large, 
U(X, y) P(X+Y) < H(X)—4 


and, replacing ¢ by 2¢, the conditions for ergodicity are satisfied. 


5. As an example of the use of this method, we consider the problem of two queues in 
parallel. For a detailed description of this problem, we refer to Haight (1958) and Kingman 
(1961). It may be formulated as a random walk in the sector x, > x, > 0 of the plane, where 
2x, is the length of the longer queue and 2, that of the shorter. Strictly speaking, this is a 
walk in continuous time, but it is not difficult to see (cf. Kingman, 1961), that, under suitable 
regularity conditions, the analysis carries through as before. 
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The transitions of the process are as follows (with mean service time 1, and arrival rate 2p) 
(2, %_) > (%,%,+1): intensity 2p (2, > 24), 
(21, %) > (%,+1,x,): intensity 2p (x, = 24), 
(2%1,%) > (%,—1,x,): intensity 1 (2, > 2), 
intensity 1 (2, > 2x, > 0), 
Pot > MMe Te erin 2 (4%, = 2, > 0). 
Hence we have w(x) =(—1,2p—1) (x, > 2, > 9), 
= (2p, — 2) (x, = x, > 0), 
= (—1, 2p) (1 > % = 0), 
= (2p, 0) (x, = x, = 0). 


If p < 1, it is easy to see that suitable level-curves are x? + 23 = constant, and in fact it 
is readily verified that (x) = A(x{+23) satisfies the conditions of Foster’s theorem for 
some A. Hence the queue is ergodic when p < 1. If it had been attempted to prove this 
result by guessing ¢, and, for instance, by trying linear functions of x, this result would not 
have been proved, and the geometrical standpoint shows why this is so. 


6. As a second example, we consider the Kiefer-Wolfowitz (1955) analysis of the many- 
server queue. For simplicity, we consider the two-server case. There it is shown that the 
waiting time in a two-server queue can be deduced from the following random walk in 
the sector 7, > 2%, > 0: 


, 


x>x’, 
where x, = [max (x,,2,+8)—#]*, 
x, = [min (x,,7,+8)—#]*. 


Here s, t represent the service and inter-arrival times, and y+ = max (y, 0). 

In fact, the x; are not constrained to take only integer values, but Kiefer & Wolfowitz 
showed that it was sufficient to consider the case in which ~,, s, t take on values which are the 
multiples of some number @, and it is clear that, without loss of generality, we may take 
w = 1. The result obtained by these authors is that, so long as Hs < 2Ht the process is 
ergodic. 

The drift vectors are easily seen to be: 


in the interior: (—#,5—2), 
On 2, = 2%: (s—#, —2), 


where 7 = Hy. At points in the interior near the boundaries, p will lie between the value 
given and the value on the boundary. It is not immediately clear what we should take as 
the value of pu on x, = 0, except that 4, < 0, uw, > 0. If we had 


a 
—-y 


we could clearly use the linear function ¢(x) = 7,+2,. In fact, the Kiefer-Wolfowitz 
analysis depends critically on this quantity, and the boundary condition at x, = 0 causes 


<l 
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a great deal of difficulty. However, from the geometrical picture, we can see that, whatever 
the value of .—,, the function ¢(x) = 2?+ 23 is suitable. In fact, for fixed 2x,, x2, 


E (xj + x3) = B[{(x,—t)*}? + {(w,+8—t)*}] 
< E[(x,—t)? + (7, +8—t)*] 
= 22 +03 —2x,6+2%,(§-t)+O(1) as xo 
< 22 + 23 — 2ax,(2¢-8)+O0(1). 


Hence, taking ¢(x) = A(x?+.23), for some A, satisfies the conditions of Foster’s theorem. 
It is easy to show that (1) is also satisfied. 
For the k-server queue, we use similarly ¢ = A z x. 


7. We therefore see that Foster’s criterion can be a powerful method of proving Markov 
processes ergodic. Whilst the technique presented in this paper is not in itself rigorous, it 
nevertheless suggests the methods by which a rigorous proof may be most easily con- 
structed. It breaks down, however, when the mean vector vanishes, as in symmetric 
walks, and a more delicate treatment is necessary. We might, for instance, use the second- 
order approximation 


d(x+y) = G(x) + By, 2% 4 Ey, y, oe 


dx, itn, 02; 
; -” ad 
leading to the condition heya ae,0e, * <-1l, 
where My = LU P(X, Y) HY;- 
y 


Another problem to which the present work is not applicable is that of absorbing barriers. 
In this case the chain is not irreducible, and cannot be ergodic. However, if we can replace 
the absorbing barriers by others which are such that the chain is irreducible and ergodic 
then the probability of ultimate absorption is unity, since in an ergodic chain there is 
probability one of ultimate transition from any one state to any other. The converse of 
this result is false. 

There is, however, a wide class of processes occurring in practice which may be for- 
mulated as random walks, and for these it is suggested that the analysis of this paper 
provides a fruitful method for examining their ergodic properties. 


I am indebted to Prof. P. Whittle for his encouragement, and to the Department of 
Scientific and Industrial Research for a Research Studentship. 
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The goodness-of-fit of a single (non-isotropic) 
hypothetical principal component 


By A. M. KSHIRSAGAR 
University College London* 


1. INTRODUCTION 


The goodness-of-fit of a single hypothetical discriminant function in the problem of 
discrimination between several groups has been considered by Bartlett (1951a). Virtually, 
this is the same as testing the adequacy of a single canonical variable to bring out completely 
the relationship between two vector variables. A single hypothetical canonical variable will 
be adequate in this ‘canonical’ or ‘external’ analysis, if (i) there is only one non-zero 
canonical correlation, and (ii) the direction of the hypothetical canonical variable coincides 
with that of the true canonical variable corresponding to the non-zero canonical correlation. 
Bartlett (1951 a) and Williams (1952, 1955) have considered these two aspects of ‘collinearity’ 
and ‘direction’ in detail, and have derived certain exact tests for them. 

This situation in ‘external’ analysis has a parallel in ‘internal’ or ‘ principal components’ 
analysis. If the population variance-covariance matrix of a number of variables has all its 
latent roots equal, except one, then the ‘ principal component’ or latent vector corresponding 
to this ‘anomalous’ root can be called the single ‘non-isotropic’ component because if this 
component is removed, the variation in any other orthogonal direction is isotropic. In 
geometrical language, all the principal axes of the familiar ellipsoid (given by the exponent 
in the density function of a multivariate normal distribution), except one, are equal; in other 
words, the ellipsoid does not degenerate into a sphere because of this exceptional principal 
axis. Such a situation can arise in factor-analysis if there is only one ‘common’ factor (for 
details see §2) besides the ‘specific’ factor in the factor-structure. If, therefore, we are 
testing the adequacy of a single hypothetical non-isotropic principal component, it is 
necessary—as in the case of ‘external’ analysis—to consider the following two aspects: 
(i) departure from the hypothesis due to there being more than one non-isotropic principal 
components, and (ii) departure due to deviation in direction of the true principal component 
from the hypothetical one. 

In § 2 of this paper, these two aspects are considered and an over-all y? test is obtained, 
and the underlying assumptions are stated clearly. In § 3, the over-all x* is partitioned and 
its ‘direction’ part x3 is separated, using a geometrical argument similar to the one used by 
Bartlett (1951 a) in deriving the ‘direction factor’ in external analysis. In § 4, x7 is expressed 
in terms of rectangular co-ordinates. In $5, x3 is expressed in terms of the sample latent 
roots and principal components in a manner similar to that of Williams (1955). This is also 
alternatively expressed in terms of ‘residual’ roots of the sample variance-covariance 
matrix, following Williams’s (1952) idea of ‘residual roots’. In §6, the problem of the 
distribution of these residual roots is considered; and finally in §7, the theory is illustrated 
by a numerical example and a method of deriving approximate confidence intervals for the 
elements of the single non-isotropic component is described and illustrated. 


* Present address: University of Bombay. 
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2. ADEQUACY OF A SINGLE NON-ISOTROPIC PRINCIPAL COMPONENT 


Let x’ = [2,,%, ...,%,] be a (row) vector having a p-variate normal distribution with zero 
means and variance-covariance matrix 2. There is an orthogonal matrix 


L = [li] = Hw |lol--- [Worl (2-1) 
such that = = L’ diag. (07, 3, ...,02) L (2:2) 


where of (i=1,2,...,) are the latent roots of £ and |, are the corresponding (column) 
latent vectors; diag. stands for a diagonal matrix, the elements of which are written in the 
adjoining bracket. If the roots are arranged in descending order of magnitude as 


of > of >... > a5, 


then I,,x is called the ith principal component. The transformation 


Y’ = [%, Yo ++. Yp] = X'L’ (2:3) 
to the principal components, shows that the y; are normal, independent variables with zero 


means and variances o? (i= 1, 2,...,»). If has only one latent root o? different from all the 
remaining (p— 1) roots which are all equal to a ( < a?), say then 


= = L' diag. (03, 3, ...,0%)L 
= (ao? —o?) ly) I, +o°I, (2-4) 
where I is the identity matrix. Thus I/,)x is the only non-isotropic principal component and 
z is completely determined by |), c? and o?. Lawley (1953) has made use of such a repre- 


sentation of Z in factor analysis when there is only one ‘common’ factor in addition to the 
‘specific’ factors, so that the factor-structure is 


x; = 9:f+ ec (¢= 1, 2, ieee Py (2-5) 
where 2; is the ith test variate, g; is the ‘loading’ of the ith test in the common factor f and e; 
is the factor specific to the ith test. Lawley assumes that f and e, are all normal independent 
variables; the variance of f is unity and the units of measurement are so chosen that the e; 


have the same variance o?, which is also assumed to be known. Under these assumptions, 
the population variance-covariance matrix is 


z= g8'+07l, (2-6) 
where §’ = [9}, .--, Jp] is the vector of loadings. Lawley then proceeds to use (2-4) to obtain 
a unique definition of § by defining it as 

& = (oj—07)F ly. 
Since o? is assumed to be known, there is no loss of generality in taking it to be unity. The 
modifications in the tests of significance derived in this paper, if 7? has some other value, are 


obvious. If o? is not known, it can be estimated by the average of the (p — 1) roots remaining 
after removing the largest root from the sample estimate of E (see Lawley, 1956). Let 


X = [z,,] (¢=1,...,p;¢ = 1,...,2) (2-7) 


nxXp 


be a sample of size n from the distribution of x. The maximum likelihood estimate of based 
on n degrees of freedom is A/n, where 


A = [a,] = X’X (2:8) 
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is the matrix of the sum of squares and products of the sample values. There is an orthogonal 


tri , 
— G= [943] = [Sw |S] ---|Sqr] (2-9) 
such that A = G' diag. (62, ...,02) G, (2-10) 


where 6} > 63 >... > Of are the latent roots of A, with §(), &(, -.., &;) a8 the corresponding 
latent vectors. 2! = [24,24 .--.%)] = X'G’ (2-11) 
is the transformation from x to the sample principal components z; (i= 1, ..., p). 

The assumption that all the remaining roots of Z, when the largest root o? is removed, are 
equal can, if desired, be tested by an approximate y? test due to Bartlett (1950, 19515, 
1951c, 1954). Lawley (1956) later investigated this statistic in considerable detail. Through- 
out this paper it is assumed that this test gives a non-significant result, thus providing no 
evidence against the adequacy of a single non-isotropic principal component 


Iix. (2-12) 
Let the null-hypothesis, now, specify that a hypothetical function I’x, where 
I’ =[1,,...,1,], VW=1 (2-13) 


is the single non-isotropic principal component. If this is true, 1 = l, the true principal 
component. If, therefore, we take any orthogonal matrix L with I’, as its first row and make 
the transformation (2-3), then y, is normally distributed with zero mean and variance o7 
while y;(i=2,...,) are all normal independent variables with zero means and variances 
o* = 1; all the y; are independently distributed. 


Hence if Y = [yy] = XU’, (2-14) 
nx Dp 
then x= > > Vit (2-15) 
i=2 t=1 


is a x? with n(p— 1) degrees of freedom. Let 
B = [b,,) = Y'Y = LX’XL’ = LAL’. (2-16) 
The above x? in (2-15) can now be expressed as 
pn n 
~ Dy¥i-— Dy = *B-VAI 
i=1 t=1 t=1 


= trA-I’Al 


as L is orthogonal. Hence the statistic for testing the over-all goodness-of-fit of the single 
hypothetical (non-isotropic) principal component I’x is 
x? = trA—A?, (2:17) 
where A2/n = I’Al/n = > y?,|/n is the sample variance of I’x. Obviously A?/o7 has a x? 
t=1 


distribution with n degrees of freedom. This x? of (2-17) measures the over-all departure 
from the null hypothesis. As already explained in § 1, this departure can be due to (i) there 
being more than one true non-isotropic principal components, and/or (ii) the hypothetical 
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component | being not the same as the true component 1), ie. due to difference in the 
directions of 1 and 1. If we are more interested in the direction aspect of the hypothetical 
component, it is necessary to isolate the contribution x3 to the total y?, due to direction and 
test it. This is done in the next section. 
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3. PARTITIONING OF THE OVER-ALL x? 


In canonical analysis Bartlett (1951 a) used an ingenious geometrical method for deriving 
the ‘direction factor’. A similar method, with appropriate changes corresponding to the 
‘internal’ analysis is employed here. If the null-hypothesis is true, i.e. 1 = l,, then any 
variate y; = lx, where l,) is orthogonal to 1 (and 1k) = 1), has unit variance and is 
independently distributed of y, = I’x. It is easy to see that the sample projection of one 
normal variable on another independent normal variable is also a normal variable. Hence 
the sample projection of Ix on I’x, viz. 


1, Al/(1’Al)}, 
is a standard normal variable. This is true for every l) orthogonal to 1. Hence 


> (Al)? 
», VAI 





has a yx? distribution with (p—1) degrees of freedom. This is the required direction com- 
ponent v3 of the total y?. This can be better expressed as 


(VAL) (iy Al) —l’Al 








mon 
Xa = 2 VAI 
VA ( > Lolo) Al 
ee: sok 
a2 
1A?’ 
=, (3-1) 


Pp 
as > kyl = I, the ly being unit orthogonal vectors. 
i=1 


4, x3 IN TERMS OF RECTANGULAR CO-ORDINATES 
There is also an alternative analytical method of obtaining yj. The matrix B of (2-16) has 
the Wishart distribution 
const. | B|}-?—-) exp { — }(by + bgg +... +b,y +5y,/09)} dB, (4:1) 


where dB stands for the product of the differentials of the p(p + 1)/2 distinct elements of B. 


Make the transformation 
B= TT’, (4:2) 


where T = [t,] (4:3) 


is a lower triangular matrix (i.e. t;; = 0 ifj > 7); the t,; are called rectangular co-ordinates. 
The Jacobian of transformation (see Deemer & Olkin, 1951) is 


p 
Qp Il tpti-t 


i=1 
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and the distribution of T' is, therefore, (see also Mauldon, 1955) 


Pp 
const. Ih t#-texp = — #,-—43(tr TT’ — 7) dT. (4-4) 
caus 1 


Hence #2, is a x20? with n degrees of freedom, while ¢?; (i = 2, ..., p) isa x? with n —i + 1 degrees 
of freedom and ¢;;(¢>j) are standard normal variables; all of them are independently 
distributed. The over-all x? of (2-17) is thus alternatively, 


2, +2.+...+0, =trTT’—#,. (4:5) 


The required direction component y? measuring the departure of the hypothetical function 
y, =1’x from the true one, viz. I,)x, due to direction, is 


x3 => a. + ae + eee +8, (4-6) 


which is a y? with (p—1) degrees of freedom if 1 = ,). The identity of this expression with 
(3-1) can be established by equating the elements in the first row of 


TT’ =LAL’ (4:7) 
(each is equal to B; see (4-2) and (2-17)). We get 
@, = l’Al = A? 
ty te = 1) Al 





Sdensaboeetateliwrs (4-8) 
. ; tito = 1, Al 
Squaring and adding, 
Pp 
i=1 
= l’A*l 
as |, are unit orthogonal vectors. Hence 
l’A?] 
2,+22,+...+2, = 2 —2? 
= Xa- (4-9) 


5. x3 IN TERMS OF SAMPLE LATENT ROOTS AND ‘RESIDUAL’ ROOTS 
From (2-3) and (2-11), the relation between the true and sample principal components is 


y = LGz = Wz, (5-1) 
where LG’ = W = [w,)]. (5-2) 
Hence Y, = WX = Wy 2%, 4+... + Wyp2p (5-3) 


p 
which, for convenience, will be written simply as } w;,z, hereafter. It should be noted that 
i=1 


i= 


WW’ = LG’GL’ = LL’ = I (5-4) 
and from (2-10), (2:14) and (2-16), 
B = Wdiag. (62, ..., 0%) W’. (5-5) 
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i=1 
From (2-10) and (5-2) 1) A*k,) = w 64. (5-7) 
i=1 
Swit 
Hence from (3-1) xi = = — ¥ wh. (5-8) 
x wee 


Thus, if the hypothetical principal component is given in terms of the sample principal 

components, as (5-3) and if it is the true one, (5-8) gives a suitable expression for x7 in terms 

of the sample latent roots 0?. Bartlett (1951 a) has derived a similar expression in ‘canonical’ 

analysis. This expression is suitable for expressing xj in terms of 0? and $7 (j= 1, 2, ..., 9 —1), 
Pp 

the sample ‘residual’ roots when the hypothetical principal component > w,z; is eliminated 


i=1 
This idea of residual roots is due to Williams (1952) and was employed by him in discriminant 


analysis. 0,(i=1,2,...,,) are the principal semi-axes of the ellipsoid 

2/03 +...+-22/8 = 1. 
The section by a (p—1)-space normal to the direction (w,,...,w,) specified by the hypo- 
thetical principal component sw, z,is anellipsoid with principal semi-axes ¢;(j = 1,...,— 1). 
Williams (1952) has proved that these ¢’s—the residual roots—satisfy the equation 


> W863 
2 eH ° (5°) 





of degree (p — 1) in ¢?. Collecting the coefficients of (¢*)?-1, (¢*)?-*, and the constant term, 
we find 


p-1 i 2 
I 9} = 55 5 0 (5-10) 
j=1 i=1 
D p-1 p 
and x G— LX OF = Y w7G/r?*. 
i=1 j=1 i=1 
p p—-1 
Hence from (5-8) = D63- D g-A’. (5-11) 


This is the required expression for y3 in terms of the original roots 6?, the residual roots ¢? 
and A?/n, the sample variance accounted for by the hypothetical component. Williams has 
also proved that 
P ye Gt-) 
2 = — —~ ___ 
OTT (9-6) 


i+j 


(5-12) 


6. CONDITIONAL DISTRIBUTION OF THE ORIGINAL AND THE RESIDUAL ROOTS 


In canonical analysis, Williams (1952) has explicitly derived the conditional joint distribu- 
tion of the original and the residual roots, making use of the sufficiency of the multiple 
correlation of the single true discriminant function. This property of sufficiency was first noted 
by Bartlett (1947, 1951a). In a recent publication, the present author (1960) has supplied 
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some missing steps and arguments in Williams’s derivation of the distribution. In this 
section, a similar argument is employed for considering the distribution of the original roots 
6? and the residual roots 7 when A? is fixed. 

The distribution of B is given by (4-1). Transform from B to 6?(i=1,...,) and W, using 
(5-4) and (5-5), viz. B = Wdiag. (63, ..., 02) W’, 

WwW’ = 1. 

The transformation is unique if w; > 0(i=1,...,). Since W is orthogonal, only p(p—1)/2 
elements of W are independent. We can take these to be w,, w,, ..., W,)_, and suitable others. 


Pp 
(w, cannot be taken as )w? = 1.) These are called the co-ordinates of W. When o7# in the 


1 
distribution of B is unity, it has been proved (see Anderson, 1958) that the 6? and W are 
independently distributed. The distribution of 6? in this case is 


‘= 


f,(02, ..., 92) Ti d03 = const. TT (03)? exp (- 1503) TI (62-63) 11.462. (6-1) 
=1 ss) 1 = 


i i<j 


Unfortunately, an explicit expression for the distribution of W in terms of the co-ordinates 
of W is not available. Halmos (1950) (see also Anderson, 1958, pp. 318—23) has proved that it 
is the ‘Haar Invariant’ distribution. Symbolically we may write the distribution as 


f.(W) d(W). (6-2) 
However, when o? + 1, the exponent in the distribution of B is 


b l 
gt tOmt-.+byp a wB+(=-1) by 


p l p 
= Y6?+ (4- 1) dw? 
1 1 1 
P 1 
= 3o+ (5-1) (6-3) 
1 oj} 
and hence 6? and W are no longer independent, but their joint distribution is 
1 1/1 D 
| 5(63,...,62)f,(W)exp{—4 (4-1 2 [1do3aw. (6-4) 
oft 2\oe? 1 


It was noted in § 2, that A?/o? is a x? with n degrees of freedom and hence the conditional 
distribution of 6? and W when A? is fixed is 


Pp 
const. f,(03, ...,0%) f.(W) dW i a6; (6-5) 





e-3A?(A2)3n—1 di2 


Rigorously one of the quantities 0? and W must be expressed in terms of A? using (5-10) but 
this is not carried out explicitly to preserve symmetry. The distribution (6-5) does not involve 
o, showing that A? is a sufficient statistic. Following Williams’s (1952) idea in canonical 
analysis, theoretically it is possible to transform to #2, ...,6%_1 from w,, Ws, ...,Wp_y using 
(5:12) and then to integrate out the other irrelevant co-ordinates of W in (6-5) and thus derive 
the conditional distribution of 6? and 47 when A? is fixed. However, this does not seem to be 


26-2 
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possible, in the general case, as f,(W) is not known. In the particular case p = 2, however, 
it can be easily shown that 


1 
fo(W) = const. Ja—w)’ (6-6) 
a) w /(1-w?) 
where W= b, Ja—w') 1 | , 
Hence the joint distribution of 07 and 63 when A? is fixed is 
const. (0463)! exp { — }(6} + 63)} (0% — 63) a0} doy (6-7) 
e~4%* (A2)in—1 (93 — A2)} (A? — 63)8 
When p = 2, we find from (5-11) 
x2 = 034+ 02-1? -O263/A2 
= (63-22) (A2— OD /A2. (6-8) 
In (6-7), make the transformation 
th, = (62-22) (AP O}/A? (6-9) 
2, = 6263/22. (6-10) 


The Jacobian of transformation is A?/(@? — 63), and hence it will be found that 2, and 2, are 
independently distributed as x? with 1 and (n — 1) degrees of freedom, respectively. One can 
readily identify these ¢,, and ¢,. with the elements of the matrix T in (4-2). As Williams 
(1952) has observed in canonical analysis, #3, is dependent mainly on 6?, while #2, depends 
mainly on 43, even though both are formally symmetric functions of 6? and 62. This is so 
because if the hypothetical principal component is identical with the true one, A? must lie in 
the neighbourhood of 6? and hence 


Xa = thy ~ (Of—A?)/A’, 
x°— Xa = he ~ 63 
and thus xj is appropriate for testing the concordance of the data with the hypothetical 
principal component, while y?— 3 gives a test for departure from the assumption of 
adequacy of a single non-isotropic principal component. Williams (1952) calls this latter test 
a ‘fluid’ one because it depends on the choice of the hypothetical principal component and 
the resulting A*. The justification for the use of é2, is that it is independent of the unknown 


parameter o?, being dependent instead on a statistic which, while to some extent arbitrary, 
is yet determinate. 


7. ILLUSTRATIO™ 


To illustrate the goodness-of-fit test, derived in this paper, a random sample of size n = 50 
was drawn, using Wold’s (1948) random normal deviates, from a multivariate normal 
distribution of four variables x,, x,, x3, 2, of zero means and variance-covariance matrix 


= = (o?-1)ll' +I, 
where o? > 1, I'l = 1 and I’ was taken proportional to 


m/’ = [m,, Mo, Mz, m,| = [0-3333, = 0-6666, 1-0, 0-3333]. 
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Fit of a single hypothetical principal component 
The matrix A of the sum of squares and products of the sample values was 

60-3282 — 43-6958 74-0990 27-1455 

124-7021 —120-1149 —33-5915 

218-3674 62-0241 

76-2765 

The latent roots of A/n, the estimate of Z are 
0; = 69748, 0, =1-1783, 0, =0-8194, 6; = 0-6211. 


The assumption that Z has only one root > 1 and all others all equal to unity can be tested, 
if desired, by using Bartlett’s statistic (1950, see also Lawley, 1956) 


6 


1 2 1 , QQ! , , , 
[m= 19-5 (2-1-7) — 5] [ 10g, (0,050) + 95+ 05 +04—(p—I)) 


In the example under consideration, the value of this is 6-2157. Comparing with the 5% 
value of a y? with p(p—1)/2 = 6 degrees of freedom, this is not significant. There is thus no 
evidence against the adequacy of a single non-isotropic principal component. We shall now 
test the null-hypothesis that the single non-isotropic principal component is I’, pro- 
a m’ = [0-3333, —0-6666, 1-0, 0-3333] 

(since the example is constructed artificially we know that this is the true principal com- 
ponent). The over-all x? of § 2 gives 





x? = rA-lAl 
= tra—™ sm 
m’m 
= 131-522 


for n(p—1) = 150 degrees of freedom and is not significant. Its direction component is 
Nie 
a" Ta 
_m’A’m m’Am 
m’ Am m’m 

= 0-523 


—l’Al 





for (p—1) = 3 degrees of freedom and is not significant. The hypothetical principal com- 
ponent is thus in agreement with the hypothesis, as it should be. However, if we take any 
other hypothetical component, say one proportional to 
fl, 1, -2, 3] 

we find y? = 409-0945 and x3 = 69-9746. Both are significant and x% indicates that the 
proposed principal component differs from the true one significantly in direction. If m is the 
latent vector of Z, corresponding to the largest root, it is obvious that only the ratio 
M4:Mg:Mz:M, is unique, and hence any one of them can be taken to be unity. Let us choose 
ms, = 1 and then the sample estimates of the three ratios m,, m., m, are provided by the first 
latent vector of A, which in the present example is 


m’ = [0-3904, —0-6645, 1:0, 0-3485]. 
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Bartlett’s (1951 a) methoc of substituting the estimates of all of m,, mz, ms, m, but one in x? 
and equating it to the 5 %, point of the y? distribution with appropriate degrees of freedom 
[n(p — 1) — 2 and not n(p — 1) asestimates of tworatios are substituted] will yield the approxi- 
mate 95 % confidence intervals for m,,m,,m, and m, readily [m, = 1 but to preserve sym- 
metry, a confidence interval for that also was obtained by Bartlett]. These are given at the 
end of this section. This method can be used with 3 also, but in this case it is not straight- 
forward, because the inequality v3, < 5 % pt. of a x? with 3— 2 = 1 degree of freedom, where 


m’A’m m’Am 
m’Am m’m 





_ 
m= ’ 


yields successively, 


g,(m,) = m4 + 7-3025m3 + 9-9037m? — 11-4221m, + 2-1266 < 0, 


) 
(mz) 
J3(M3) = m$ — 0-6342m3 — 1-4439mZ + 0-3897m, + 0-5163 < 0, 
g4(m4) = m4} + 8-6774m3 + 14-8688mj — 14-2838m, + 2-0636 < 0, 
when estimates of all of m,,m , m3, m, but one are substituted in v3. The roots of g,(m,) = 0 
ae 0-2516, 0-5264, —4-5558, —3-5248 


and as g,(m,) < 0, the confidence interval for m, is either (0-2516, 0-5264) or (—4-5558, 
— 3-5248); but as the sample estimate of m, is 0-3904, it is reasonable to take the confidence 
interval (0-2516, 0-5264). Similarly, the roots of g,(m.) = 0 are 

—0-8646, —0-4914, 2-0663, 1-7829 


and as g,(m,) < 0 and m, = —0-6645, the confidence interval for m, is (—0-8646, — 0-4914). 
The roots of the other two quartic equations g,(m,) = 0 and g,(m,) = 0 are respectively 


0-7899, 1-:2781, —0-7654, —0-6684 
and 0-1832, 0-5256, —5-4675, —3-9186. 


Arguing as in the case of m, and mg, the confidence intervals for m, and m, are (0-7899, 
1-2781) and (0-1832, 0-5256), respectively. Thus we get the following table: 


Confidence interval 
True Sample “ns 





Parameter value estimate Based on x? Based on x? 
m, 0-3333 0-3904 (—0-1118, 1-0414) (0-2516, 0-5264) 
Ms — 06666 — 0-6645 (—1-5262, —0-1479) (—0-8646, —0-4914) 
Ms 1-0 1-0 (0-4342, 2-6623) (0-7899, 1-2781) 
mM, 0-3333 0-3485 (0-1762, 1-0153) (0-1832, 0-5256) 


It can thus be seen that the confidence intervals based on yj are much narrower than those 
based on the overall y? and provide much better information, though the calculations require 
solving quartic equations. 


My sincere thanks are due to Professor M. 8. Bartlett for his kind interest in this work and 
for much valuable guidance and helpful discussions during the preparation of this paper. 
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On the evaluation of the probability integral of the 
multivariate ¢-distribution 


By S. JOHN 
Indian Statistical Institute, Calcutta 


1. INTRODUCTION AND SUMMARY 


Arandom vector t = (t,, ta, ..., t,) is said to have the p-variate t-distribution with n degrees 
of freedom if its distribution in p-dimensional Euclidean space has for its density function 
the function 

|A|? Dn +p) 


1? 2 
In (ty, ty, --+> tp; P) = —— [bee Dd Mbit 


—3i(n+p) 
(n7)3” T(4n) M i=1 j=1 | 


(-—0 <t;< 0;4=1,2,...,p), (1) 


with P = (p,;) a positive definite p x p matrix having every one of its diagonal elements 
equal to unity, A = («,;) = P-, and | A| denoting the determinant of the matrix A. We shall 
introduce the symbol G,,(h,, hg, ...,h,; P) to denote the corresponding distribution function, 
i.€. 


hi (he hp 
Gh iy «os hys B) = f | sf Yn (ty, »--sty3 P) dt, ... dt. (2) 


In the bivariate case, we shall write g,,(t7,t2; 0,2) for g,(t,,t2; P) and G,,(t,,t.; p42) for 
Ga(ty ta; P). 

If the random variables x,, ...,2,, follow the multivariate normal law with means zero, 
common variance o and a correlation matrix (p,;) and if (ns*)/o? is an independent x? 
variable with n degrees of freedom, therandom vector t = (t,, ...,t,), wheret; = 2,/s (it =1,...,p) 
will have g,,(t,, ...,t,; P) as density function. Bechhofer, Dunnett & Sobel (1954) consider 
this distribution in connexion with a problem in the ranking of means of normal populations. 
Dunnett & Sobel (1954) give a formula for evaluating the probability integral when p = 2. 
Using this, they have prepared tables of the function G,,(h,h; + 0-5) and its inverse. Dunnett 
& Sobel (1955) provide approximations for G,(h,, ...,,; P), valid when P satisfies certain 
conditions. In this paper we give an alternative formula for the evaluation of the probability 
integral. Though we too discuss only the bivariate case in detail, our method is of wider 
applicability in the sense that it can be adopted to get the probability integral of the multi- 
variate t-distribution of any dimension. 

We must mention here that the method of this paper is similar to that of Kendall (1941) 
for evaluating the probability integral of the multivariate normal distribution. 

We have already mentioned that the multivariate t-distribution arises in the ranking of 
normal populations according to their means. We give more applications towards the end of 
this paper. It is shown how the multivariate ¢-distribution can be used in setting up simul- 
taneous confidence bounds for the means of correlated normal variables. Other applications 
are in constructing simultaneous confidence bounds for the parameters in a linear model and 
for future observations from a multivariate normal distribution. Further applications will 
appear in a later paper. 
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2. THE CHARACTERISTIC FUNCTION 


To derive our formula for the probability integral we require the characteristic function. 
For this purpose, we may, without loss of generality, put o = 1. The characteristic function 








is given by é 
+o o lA | 
(Oy, Op, «--,8 y= fo. i ei 2 oe xp (i0,27/8 + 10,2%/8-+... +10, 25/8 
") 
(ns®)bn—1 e-ine? 
x exp (-53 > z 55%; x,) sea p (in) ern dX, 
} entp-1 Inst +a +0 : ’ 
=n! Is ar =P Gn: e-3n* da ie ee Ha" exp (10,2, +... +10,2,) , 


x exp(— 30" Bis? i2j) dz. dz, 


. (ns®)tn— -1 o—dnet . 1 ‘ 
[rs olny: any ° d(ns exp (5, 0P 


= §n—-1 —_ we P 
rani! t exp | t rT; OPO dt, (3) 
where 0 = (,,...,9,). 





3. THE PROBABILITY INTEGRAL 


For the sake of simplicity we now restrict our attention to the case p = 2; exactly similar 
methods will give the probability integral whatever positive integral value p has. 
In this case, 


P(9,, 92) = ran}, exp [-+—-2 r+-00] | = a(- *) 0565 dt, (4) 


r=0T: 
where we have set p = (45. 
By the inversion theorem, the joint density function g,,(t,, t,; p) of ¢, and t, is given by 


nltv tai P) = Gas mi) oJ exp (—i0,t, —iOgty)d0,d0, 
Sacete rs aI, thn ‘exp | - 5 (r+ 09) al (-%2)' oxen] | 
= rah tS 7() (DG) 
x exp (-* at) (=) (£)'exp (-;.9)| ‘ (5) 


We are now in a position to evaluate the sites oye It is easy to see that 


G,{hy, he; P) = Yn,o(Ay, he) +5 o “In, (hy, hg), (6) 


where 


namely) | 
“ ra { exp Fut + ag +m9)| ted (2) 4) #,([7)") 


(r =1,2,...). (7) 
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Here H,(x) denotes the Hermite polynomial of degree r defined by 










Gite = H,(x)e™, (8) 
and 
e-ttin 5 1 7 1. 
Yn,o(hy, he) = t Me _ tt} dt, __ exp — 7, fat dt, | dt 
AB 2t]4 
ia ssn) (2's) 
yhere G(x) # atte tut J (10) 
whe = ary e-2™" du. 
We give below explicit expressions for y,, ,(y, hy) (r= 1, 2, ..., 6). 
If we set z= 1+(1/n) (A? +A), (11) 
Yn, (hy, hig) = 2-8", (12) 
Yn, (hy, hg) = hyhyz-bn+, (13) 
Yn, a(hy, hg) = {1 + (2/m)} Wjhge-An+ — (5 + h3) ae) 4 2-dn, (14) 
Yn, a(2y, he) = (14 “) (1+ ;) h3 h3z—n+8) _ 3(1 +7) h, h(h? + h3) 2-Gn+® + 9h, hgz-hn+», 
} (15) 
Yn, 5(hy, he )= (14 +5) (14 (14 ‘) (1 +) ht hiz-dnt4) _ 6(1 +7) (1 +3) h2h3(h2 + h3) g—(hn+3) 
2 
+ es + er (hd + 12h2h2 + h$) 2-Gn+® — 18(h2 + h3) 2-ntD + Qz-4m, (16) 
n 
8) pe 15 —(}n+5) 
Yn, 6(hy, he) = ate h8 h3z-Qrt 
—10}1 (.42)( oh z 3 h3(h2 + h3) z-hn+) 
_ +.) eet se 
+9(14: =) (14 “) ha h(3h4 + 20h3 2 + 3h3) 2-Gn+9 
2 
— 150(1+ +3) hy ha(h? + h3) z-d"+2) 4 225h, haz ant), (17) 
The evaluation of y,, 9(h,, h,) requires separate consideration. 
4, EVALUATION OF ¥,,9(h,, he) 
First we observe that Yn, oly, Ae) = Gy(hy, he; 9). (18) 


Equation (9) can be used to evaluate y,, 9(/;,h,). The integration involved has to be done 


numerically. Gauss’s formula for numerical quadrature (Kopal, 1955, p. 371) will be found 
especially convenient. By this method we have prepared tables of y,, 9(h,, 42) form = 11, 12. 
The first of these, that for n = 11, is reproduced at the end of this paper as Table 2. The next 
section gives recurrence relations connecting these values of n with other values of n. 


Other methods for the evaluation of y,,(h,,h,) are given by John (1961). The table was 


computed on HEC-2M at the Indian Statistical Institute. The programme was prepared by 
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Mr P. K. Mitra, and Mr B. Mukherjee was responsible for the running of it off the computer, 
A similar table for the case n = 12 is available at the Institute. 

In the tables, y,, (4,2) has been tabulated only for values of h, and h, satisfying the 
inequalities h, > 0, h, > 0, hy > h,. Values of y,, (1, 22) for hy or h, negative can be found 
from the formulae 

Yn, ol’, hg) = G,(he) — Yn, ol —hy, he), (19) 


Yn,o(hy, hg) = Gn(hy)—Yn,o(A1, — he), (20) 


where G(x) is the probability that a random variable having Student’s t-distribution with 
n degrees of freedom has a value less than or equal to x. If both h, and h, are negative, we 
may use the formula 


Yn, oy, he) = 1+Yp,0(—hy, — he) — Gn (— Ay) — Gy — he). (21) 
Further, there is no loss of generality in assuming h, > h, since 
Yn, ol, he) = Yn, ole, hy). (22) 


5. SOME USEFUL RECURRENCE RELATIONS 


From equation (9), we get by integration by parts the following recurrence relation: 


h..h 2 ,. 35 t 
rath = sof] ft) 


+1]? n+1)]% 
-. 1 1“ n+1 2 2 2 nt+1 2 1 
ae runs| nae a . (28) 


Deen) Lb (1+ (1m) ape {1+ (1/m) hgptens 


In getting equation (23), t!”-! was the factor which we selected for the first integration. 
If we now select the factor e~ for the first integration, we get the recurrence relation given 
below :* 


‘r 2yt 213 
Yn, o(Ay, hg) = Yn-a.0{ } -*| hy, 2 -*] hs) 


n—1]3 n—11% 
+4(nny4 EN) sony i) r nC |g 4) (n> 2) 
‘  TGn) UO +(/n)rye {1 + (1/n) h3}ke—D ; 








(24) 


By the same procedure we can express #2 9(h,,h,) entirely in terms of the probability 
integral of the univariate ¢-distribution. This relationship is given by the following equation: 


_ fh, G,([2+h3}-4 hy) he G,((2 + h2]}-4 hy) 
Yo, (M1, hg) = $+ 8 , Pe + mer +; ae | . (25) 


This, together with (24), shows that y,, 9(h,,h.) for all even n can be built up from tables of 
the probability integral of the univariate t-distribution. If n is a small even integer this 
procedure is to be preferred. Similarly if n is a fairly large integer (odd or even) it may be 
advantageous to connect it with the probability integral of the bivariate normal distribution 
through the formula (23). 


* Relation (24) can be derived also from (23). 
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5. COMPARISONS WITH THE VALUES GIVEN BY BECHHOFER, DUNNETT 
AND SOBEL (1954) 


Asa check on the correctness of our formula and as a means of seeing how good an approxi- 
mation is obtained by taking the sum of a few terms from the beginning in the series expan- 
sion (6), we carried out some computations. The results are presented in Table 1, where we 
have taken n = 12 and /, = h, = h; in the approximation, we have used the first six terms 
in summing the second expression on the right-hand side of equation (6). It is seen that for 
p= +0-5, the sum of the first seven terms of the series expansion (6) provides an approxi- 
mation to the correct value which is sitisfactory for most purposes. If the value of |p| is 
larger, more terms have to be included in the summation to get the same accuracy. We have 
developed simpler methods for computing the probability integral in such cases. These 
methods will be published when the required tables are ready. 


Table 1. Values of G,.(h, h; + 0-5) 








Case p = 0:5,n = 12 Case p = —0-5,n = 12 

t A— = a — ‘Y 

h Approx. from (6) Exact value Approx. from (6) Exact value 
0-00 0-3333 0-33333 0-1667 0 16667 
25 4355 43555 +2799 +27988 
-50 *5421 -54150 -4131 -41366 
“75 -6429 *64292 +5488 -54880 
1-00 -7330 -73301 -6694 -66936 


6. THE MULTIVARIATE CASE 
We have discussed in detail how the probability integral of the bivariate t-distribution 
may be evaluated. It is possible to get an expression similar to (6) for the probability integral 
of a multivariate ¢-distribution of any dimension by methods similar to those which were 
adopted in the bivariate case. The formula is the following: 





o ] 
Gi, (hy, hg, «+5 ty; P) = G(hy, hg, .-. hy; 1) + (2m) & = Yn. -++shy; P), (26) 
where 
1 . 
VY, 2(hy, ---» hy; P) = Tan) f the-lexp[—#{1+ (1/n) (AZ +... +A2)}]U,, (t; hy, ..., hp; P) dt, 


and (27) 


hob = hoe) Soma) EP ANT on 


It is to be understood that in (28) after expansion of the integrand, H’([2t/n]*h,) is to be 
replaced by H,(({2t/n]*h,) if r > 0 and by (27)4 G([2t/n]* h,) exp {(1/n) th?} if r = —1. 

For many of the terms in (26), explicit expressions similar to those in equations (12) to (17) 
can be given; but others have to be evaluated by numerical quadrature. 


7. APPLICATIONS 


The multivariate ¢-distribution arises in many statistical problems. In this section we 
shall describe a few of these problems. 
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7-1. Simultaneous confidence bounds for the means of correlated normal variables 


Suppose x = (2,,...,2,) follows a multivariate normal distribution with mean vector 
ft = (fy, ---) Mp») and dispersion matrix o?P. P is the correlation matrix and we shall suppose 
that this is known. The parameters and o? are unknown. It is desired to set up simultaneous 
confidence bounds for the ju,’s. 

This can be done as follows: draw a sample of size N from the population of x’s. Let the 
observations be x,; (t= 1, 2,...,p; 7 = 1, 2, ..., NV), the first subscript pertaining to the variate 
and the second subscript pertaining to the sample. Let 


¥, = (= 2.) [av (¢=1, 2, ...,p). (29) 
p N 
Then = |= 25 E rH) (%»-#)| [tar 1) p} (30) 


is an unbiased estimator of o?. The random variable (N —1) ps*/o? is distributed inde- 
pendently of the %,’s as y* with (N —1)p degrees of freedom. We shall set n = (N —1)p. 
Determine h so that 


h h 
| | In(ty, ---5tp; P)dt,...,dt, = a, (31) 
~h = 


where « is a positive number between zero and one. Then the inequalities 
@,—N-ths < pw, <%,+N-ths (i =1,2,...,p) (32) 


will be simultaneously satisfied with probability a. 

The condition that all the variates have the same variance can be slightly relaxed. It is 
enough that the ratios of the variances are known. Thusif V(x,;) = c,o? (i= 1, 2, ..., p) and the 
c,’s are known, we have only to consider y,; = c; 4a, in the place of x,. This will yield simul- 
taneous confidence bounds for cz 4,’s, which can be readily converted into simultaneous 
confidence bounds for the j,’s. 

We must refer here to a paper by Olive Jean Dunn (1958) discussing methods of setting 
simultaneous confidence bounds for the expected values of correlated normal variables. She 
does not require that the correlation matrix be known. She achieves her result by con- 
sidering p linear combinations of the observations which have the same expectations as the 
original variables. The first linear combination is a linear combination of the observations 
on the variable x,; the second linear combination is a linear combination of the observations 
on x, and so on. In so far as the linear combinations are not unique, this procedure is 
unsatisfactory. 


7:2. Simultaneous confidence bounds for future observation 


Here again we will suppose that the population is multivariate normal, that all the 
variates have equal variances and that the correlation matrix is known. A sample of size 
N is available and it is required to set up simultaneous confidence bounds for the com- 
ponents of a future observation vector x = (2j,...,%,). 

Consider the inequalities 


Z,—(1+N-)ths <2, < %,+(1+N—)ths (i=1,2,...,p), (33) 
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%,, h and s having the same meanings as in § 7-1. The probability that these inequalities will 
hold simultaneously is «. The inequalities (33) will thus serve as simultaneous confidence 
bounds for the components of zx. 

As in §7-1 the condition that all the variates should have the same variance can be 
replaced by the weaker condition that ratios of the variances should be known. 


7-3. Simultaneous confidence bounds for regression coefficients* 


In regression analysis we have N independent normal variables y,, ys, ..., yyy With common 
variance o* and expectations given by 


Pp 
Ely,) = By+ 2 Biter (r=1,2,..., NY). (34) 
The 2;;’8 are known constants. The f’s are unknown parameters. 
Put N Ns 

Si = = (ti, — %;) (%j,—;) (0,9 = 2, ** -»P), (35) 

r= 
S = (Sj), (36) 

N _ . 
Tot = X Yr(®ir — %) (= 1,2, oso9 Do (37) 
N — 

. — u (Yr 9). (38) 


Let (0, bs, ...,6,) be a solution of the equations 
81,5, + Sypbg+... +8. b => Faas’ 


lp”“p y 
S51 + Sygb2+ ... + Sa,b, = Typ, (39) 
8510, +Syb2+...+ Spb, = Typ. 
Also let 6? = Wool (Tyy — 51 Ty — 5g Tyg — .-. —b, Ty y)- (40) 


Then (5,, 5g, ...,b,) have a multivariate normal distribution with means (f,, A, ...,8,) and 
variance-covariance matrix (c,;;) ¢? = S-1o°. 

s* is an unbiased estimator of o®. (N —p—1)s*/o? has a x? distribution with (N — p—1) 
degrees of freedom and is independent of (5,, ...,,,). We shall set n = N—p-—1. 


+99 


The correlation matrix (;;) for (b,, bg, ...,b,) is given by 
Dis = (CapC yy)? C5 (41) 
and is known. Determine h so that 
h fh 
| | Inlty, ---ty; P)dt,...dt, = a. (42) 
-hJ—-h 
Then the probability that the inequalities 
b,—he}.s < 8, <b,+he},s (i = 1,2,...,p) (43) 


are simultaneously satisfied is «. Thus inequalities (43) serve as simultaneous confidence 
bounds for £,, #2, ...,8,. Simultaneous confidence bounds for any set of independent linear 
functions of the parameters can be set up in this fashion. These bounds appear to be more 


* I am indebted to Dr C. R. Rao for pointing out the possibility of this application. 
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relevant than Scheffé’s (1953) who gives methods for setting up simultaneous confidence 
bounds for all possible linear functions of the parameters, standardized in a particular way. 
Most often we are interested in a few parametric functions and not the whole family of such 
functions. These functions get a bad deal in Scheffé’s method; the width of the confidence 
interval is unnecessarily large.* 
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Table 2. Values of y,,o(h,, he) for n = 11 
Ns 0-0 0-1 0-2 0-3 0-4 0-5 0-6 0-7 0-8 0-9 


0-25000 

-26946 0-29051 

-28871 -31133 0-33369 

*30755 -33170 -35558 0-37894 

*32579 =-35141 -37675 -40154 0-42552 


0 

1 

2 

3 

4 
0-5 0-34326 0-37029 0-39703 0-42318 0-44848 0-47269 

6 -35983 -38819 -41624 -44368 -47022 -49562 0-51967 

7 *37538 -40498 -43427 -46290 -49061 -51712 -54222 0-56575 

8 -38983 -42059 -45100 -48075 -50953 -53707 -56314 -58759 0-61027 

9 -40314 -43495 -46641 -49717 -+52693 -55541 -58238 -60766 -63111 0-65267 


1-0 0:41529 0-44805 0-48045 0-51213 0-54279 0-57212 0-59990 0-62593 0-65010 0-67230 


1-2 -43616 -47053 -50452 -53777 -56994 -60072 -62988 -65721 -68258 -70590 
1-4 *45272 -48835 -52359 -55806 -59141 -62333 -65357 -68191 -70823 -73243 
1-6 -46552 -50210 -53828 -57367 -60791 -64070 -67175 -70088 -72792 -75280 
1-8 *47516 -51244 -54932 -58539 -62029 -65372 -68538 -71509 -74267 -76805 
2-0 0:48229 0-52008 0-55745 0-59401 0-62940 0-66328 0-69539 0-72551 0-75349 0-77924 
2-2 ‘48747 -52561 -56334 -60025 -63597 -67019 -70261 -73303 -76129 -78731 
2-4 -49119 -52958 -56755 -60470 -64067 -67511 -70775 -73838 -76685 -79305 
2°6 -49382 -53239 -57053 -60785 -64398 -67858 -71138 -74215 -77070 -79709 
2-8 -49568 -53430 -57262 -61006 -64630 -68101 -71391 -74479 -77349 -79991 
3-0 0:49698 0-53574 0-57408 0-61160 0-64791 0-68270 0-71567 0-74662 0-77539 0-80187 
4:0 -49948 -53838 -57686 -61452 -65097 -68589 -71900 -75008 -77897 -80557 
5-0 -49989 -53881 -57732 -61499 -65147 -68641 -71953 -75063 -77953 -80615 
6-0 ‘49997 -53890 -57740 -61508 -65156 -68650 -71963 -75073 -77964 -80626 
7:0 -49999 -53892 -57742 -61510 -65158 -68653 -71966 -75076 -77967 -80629 


8-0 0-49999 0-53892 0-57743 0-61511 0-65159 0-68653 0-71966 0-75076 0-77967 0-80629 


* Scheffé’s method is easier to apply, as a referee has pointed out. 
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0-9 


65267 


67230 
70590 
73243 
75280 
76805 


77924 
78731 
79305 
79709 
79991 


80187 
80557 
80615 
80626 
80629 


80629 





0-9 


0-65267 


0-67230 
-70590 
*73243 
*75280 
*76805 


0-77924 
-78731 
-79305 
*79709 
-79991 


0-80187 
*80557 
*80615 
-80626 
*80629 


0-80629 


2-4 


0-96554 
-97064 
*97421 


0-97671 
*98144 
*98220 
*98234 
*98238 


1-0 


0-69253 
*72714 
*75448 
*77547 
*79118 


0-80271 
*81103 
*81695 
82111 
*82402 


0-82604 
*82985 
*83045 
*83056 
*83059 


0-83059 


2-6 


0-97578 
-97939 


0-98191 
-98670 
-98747 
*98762 
*98765 


1-2 


0-76351 
*79224 
*81430 
*83083 


0-84296 
*85172 
“85795 
*86233 
*86540 


0-86753 
*87154 
*87217 
*87229 
*87231 


0-87232 


2-8 


0-98302 


0:98556 
-99040 
*99118 
-99133 
-99136 
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Table 2 (cont.) 


1-4 


0-82209 
*84502 
*86222 


0-87485 
-88396 
*89045 
*89503 
*89822 


0-90045 
-90463 
-$0530 
-90542 
*90545 


0-90545 


3-0 


0-98812 
*99299 
*99378 
*99393 
-99396 


1-6 


0-86864 
*88637 


0-89939 
-90881 
*91551 
*92024 
*92355 


0-92585 
-93019 
-93088 
-93101 
*93104 


0-93104 


4-0 


0:99795 
-99876 
*99892 
“99895 


1-8 


0-90450 


0-91784 
-92749 
*93437 
*93922 
*94262 


0-94499 
*94946 
*95017 
*95030 
*95033 


0-95034 


5-0 


0-99958 
-99974 
-99978 


2-0 2-2 2-4 


0-93142 

*94125 0-95122 

*94827 -95834 0-96554 
*95322 -96337 -97064 
-95669 -96690 -97421 


0-95911 0-96937 0-97671 
-96369 -97403 -98144 
*96442 -97477 -98220 
*96455 -9749i1 -98234 
*96459 -97495 -98238 


0-96459 0-97495 0-98238 


6-0 7-0 8-0 


0-99990 
*99994 0-99998 


0-98238 0-98766 0-99137 0-99397 0-99896 0-99979 0-99995 0-99999 1-00000 
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Computing the distribution of quadratic forms in 
normal variables 


By J. P. IMHOF 
University of Geneva 


1. IytTRODUCTION 


In this paper exact and approximate methods are given for computing the distribution of 
quadratic forms in normal variables. In statistical applications the interest centres in 
general, for a quadratic form Q and a given value x, around the probability P{Q > 2}. 
Methods of computation have previously been given e.g. by Box (1954), Gurland (1955) and 
by Grad & Solomon (1955). None of these methods is very easily applicable except, when it 
can be used, the finite series of Box. Furthermore, all the methods are valid only for quadratic 
forms in central variables. Situations occur where quadratic forms in non-central variables 
must be considered as well. Let x = (z,,...,2,,)’ be a column random vector which follows 
a multidimensional normal law with mean vector O and covariance matrix =. Let 

= (4, ..-, fy)’ be a constant vector, and consider the quadratic form Q = (x +p)’ A(x+p). 
IfZis non-singular, one can by means of a non-singular linear transformation (Scheffé (1959), 


p. 418) express Q in the form m 
Q= LAX, sa: (1-1) 
r= 


The A, are the distinct non-zero characteristic roots of AZ, the h, their respective orders of 
multiplicity, the 6, are certain linear combinations of 1, ..., “,, and the yj, , 2 are independent 
x?-variables with h, d.f. (degrees of freedom) and non- ow parameter 6?. The variable 


Xi:s2 is defined here by the relation yj, 52 = (v,+6)?+>¥ > w?, where x,,2%5,...,%, are inde- 


pendent unit normal deviates. In case 2 is singular but Ai is not, a similar decomposition can 
be obtained except that an additive constant might figure in the right-hand member of (1-1). 
This is the case for instance if Q@ = (x,+2,)*+23, where the linear constraint 7, +2, = ¢ is 
imposed. 

We illustrate the reduction of a quadratic form to the form (1-1) on the correlator 


n 
aif’ Um — Mi) Yi- 1), 
where the x; and y; are normal deviates with zero means and we assume that 
cov (x,,2;) = cov (y;,¥;) = 5;;, cov (x,,y;) = pd; and |p| <1. 


Expressions for the probability density of w have been given in some particular cases of these 
assumptions by Roe & White (1961). Let 


S az (24, «<p See Fas---3%q) BOA § = (p,, --.2 Mas Fa ---0 Fa) - 


Then w = (z—€)’ A(z—6), where A and the covariance matrix = of z can be partitioned into 
square submatrices of n rows each: If U denotes the unit (identity) matrix, 


_—[U_ pU _fo U 
2=10U ul A= |: 


U O 
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Under the linear transformation z* = Pz, where 

Q’-ptU -—(1-py4 

(l+p)-$U = (1+p)-? U)’ 

the correlator takes the form w = (z* —¢*)’ A*(z* —¢*), where €* = P¢ and A* is a diagonal 
matrix having its n first diagonal elements equal to p— 1, the m remaining ones equal to p + 1. 


The covariance matrix of z* is the unit matrix. The random variable w can therefore be 
written 


p= 24[ 


w = (p+1)Xn;a3+ (0-1) Xnzat (1-2) 
a linear combination of two independent y?-variables with non-centrality parameters 


8 = +p) Smet 1)? and 63 = 4(1—p)"} Bime— mn) 


2. FINITE EXPRESSIONS 

Box (1954) has considered the case where in (1-1) 6? = 0 and h, = 2v,(r=1,...,m), the v, 
being positive integers. In statistical applications, it is often possible to arrange for the 
d.f.’s to be even. A problem in the theory of Gaussian noise where this condition is always 
fulfilled is described by Davenport & Root (1958). Box has then expressed the probability 
P{Q > x}by means of a finite series of x? probabilities. Using a different proof, a modified and 
often more convenient form of his result will be obtained, as well as a corresponding expres- 
sion for the probability density g(x) of Q. Suppose A,,...,A,, have been ordered, so that 


m Pp 
Ay > Ag >... > Ay > O> AQ >... > Ap. Let n = Dy, and g = dy,. 
1 1 


THEOREM. If x > 0, then 





™m ‘ A Pp 1 ovr-1 
P|SA x8, > x| a (y,— 1)! OAM F(A, x) 7 (2 1) 
where F,(A, 2) = A" exp{—a/(2A)} T] (A-A,)*. 
r=1 


r+k 


Proof. Let Aj, ...,Aj, Agus «+-sA}, be real, pairwise unequal parameters such that A, > 0 for 
s=l,...,q and A, < 0 fors =q+l.,...,n. Let Q’ = TA y2(s), where the n x?-variables 3(s) 
1 
with two d.f. each are independent. The characteristic function ¢’(t) of Q’ is 
n 
$' (t) = T](1— 24d, t), 
1 
and correspondingly the probability density is 
1 +0 n 
‘inh o — ite —2id't)-1 
g(x) = =| € Il (1 —2iA,t)— dt. 
The integrand having simple poles only, one finds for x > 0 
a n 
g(x) = x t,(x) IT (As—Ag), 
8= = 
where II’ indicates that the factor corresponding to s’ = s is omitted, and where 
t.(x) = $A,"-* exp {—/(2A,)}. 
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in which X and Y are independent Poisson variables with expected values $x and }6?, 
respectively. Johnson (1959) rediscovered this equality and used it as a starting point to 
k 


The density g(x) of Q can now be obtained by a passage to the limit. Let 7, = 0,0, = ¥ »v, 
r=1 


for k = 1,...,m. For positive distinct integers a, b,...,d all not larger than m, denote by 


lim the passage to the limit 
a,b,...,4 = ¥ ’ , 
) gore 7 Bere gener ney Ps 


Then g(x) = lim g’(x) can be written 
er 


g@)= Slim Yt) Th aa 


8=o%K-,+1 8’ =oR-,+1 


x lim [J (Aj—Aj)? lim Jf (Aj-A;) 


1,...,k—-1 8’<or-1 +1,...,.m 8’ >oK 
Pp Ck Ck 
=- Slim EF (A@ie) Te @&-ax}, 
k=1 k 8s=ox-,+1 8s’=oK-,+1 
where f,,(A, x) = (0/02) F(A, x). One recognizes the summation over s as the divided difference 
of order v;,—1 of the function f,(A,x), corresponding to the v, ‘abscissas’ Aj, 41, ---)AG,- 
Hence, by a well-known result, one has for x > 0 





vE—-1 
ge) =- 3 a peihaa)| (2-2) 


k=1(¥,—1)! A=Ay 
Upon integrating this expression, formula (2-1) results. For x < 0, one must substitute 


m Pp m p 
> for — > in (2-2) and 1— 5 for > in (2-1). 
pt+l 1 pt+l1 1 

A particular case of the theorem is 


v—1 
Even using this, it is not easily possible to establish in general the equivalence of (2-1) with 
formula (2-11) of Box (1954). 

The formulae (2-1) and (2-2) are very convenient to use when all v,, are small. With in- 
creasing v,,’s, the labour of computing the corresponding derivatives of F(A, x), or f,(A, x) 
rapidly becomes considerable. The numerical method of the next section is then often easier 
to apply. 

Consider now the case where the variables are non-central. The characteristic function of 
(1-1) is 





$(t) = T1(1—2ia,£)-#+ exp fis; Ora! (2:3) 
i, sali P > 1—2i0,1)" 

To obtain the probability density by integration of the inversion formula appears hopeless, 
except in the particular case m = 1. One finds then for the density p,,(2, 5) of x2, 52 the 
ne Pav, 82) = hexp{—}e+0%)} (xl/a)n1 I, (8x), (2-4) 


where J,(z) is the Bessel function of purely imaginary argument, of order v. This formula has 
been given first by Fisher (1928) who pointed out on the same occasion the relation 


P{y3,.92 < x} = P{X-—Y > v}, 
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derive some simple approximations for P{x?. 5: < x}. However, expressions in finite terms 
obtainable for such probabilities, when n is odd, by integrating (2-4) do not seem to have 
previously been written down explicitly. We shall give the first few of them, which are 
simple enough to be useful in numerical work. It is necessary to assume n > 1. Write then 
n = 2m+3(m=0,1,...). A classical formula permits the expression in finite terms of the 
Bessel function of half integral order appearing in (2-4). One finds 
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1 a r m — ~ 
P,l2, 6?) = 26"-2(27)} [ev I ( an 1) A,,(", x) ~ ( nm 1) +e ae A,,(r, »)| ) 
4\m—r 
ste 400 an + #)I (Oot 
where a=x?-—6, b=2?+6, A,,(r,2) rlam—r) 


The above density can be integrated explicitly. If I(x) denotes the cumulative distribution 
function of a unit normal variable, the result is found to be 


P{x2..92 > x} = 2—I(a)—1(b) + (6, a) —K,,(6, —)], 


1 
—.—~ [K 
52am! ” 
where K,,(3,x) = exp {—4a*} P,(d,~) and P,(6,x) is a polynomial in é of degree n—3. The 
general expression for P,,(6,x) is not simple. The polynomials corresponding to the values 
n = 3,5,7 and 9 are 

P,(6,2)=1, P,(8,x) = 26%+2d-1, 
P,(8, x) = 364+ 3ad3 + (x? — 4) 62-320 + 3, 
P,(6, x) = 46% + 6255 + (4a? — 10) d4 + (23 — 15x) 53 — (627 — 18) 67+ 15ad— 15. 


In the applications, b is often large enough so that one can assume J(b) = 1, K,,(6, —b) = 0. 


3. NUMERICAL INVERSION OF THE CHARACTERISTIC FUNCTION 


In this section we show that the cumulative distribution function F(x) of the variable Q 
defined by (1-1) can be obtained quite easily by straightforward numerical integration of an 
inversion formula. The standard inversion formula gives an expression not for F(z) itself, 
but for the difference F(x) — F(0). If Q, as is the case, for instance, in (1-2), is not strictly 
positive, a formula is needed which gives F(x) directly. Such a formula is implicit in work of 
Gurland (1948) and has been later derived explicitly by Gil-Pelaez (1951), namely 


Fea) = 4-2 [ri gterepenjat, (3:1) 


where #(z) denotes the imaginary part of z and where the characteristic function ¢(t) of 
Q is given by (2-3). Using the relations 
arg[(1—ibt)-7] = gtan-1 (bt), |(1—<bt)-9| = (1+5%t?)-29, 
arg [exp {iat/(1—ibt)}] = at/(1+57t?), |exp {iat/(1—ibt)}| = exp {—abt?/(1 + b70)}, 


one finds that for the quadratic form Q of (1-1), equation (3-1) can be written, after the 
substitution 2¢ = u is made, 


© sin O(u) 


PQ > 2) = 445 | pa 





du, (3-2) 
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where O(u) = = > [h, tan} (A,u) + d7A,u(1 + APu?)-] — dru, 
I 


p(u) = T1(1 + A?u®)t exp 5 3(d,A,u)? / (1+ ru) : 
1 1 


The following relations are immediate 





sinO(u) 1™ 
= = DA,(h, + 62)— 3°3 
u>0 up(u) 2% ( ) te ( ) 
—00 if «>0, 
lim 6(u) = +00 if «<0, 
ua 


7TAh,A, || if «=0. 
44 


The function up(u) increases monotonically towards +o. Therefore, in numerical work the 
integration in (3-2) will be carried over a finite range 0 < u < U only; the degree of approxi- 
mation obtained will depend (apart from rounding-off errors) on two sources of error: (i) the 
‘error of integration’ resulting from the use of an approximate rule for computing 


U 
Iy = rf [wp(w)]—* sin 0(u) du, 
0 
and (ii) the ‘error of truncation’ 


ty = mf" [up(u)}— sin O(w) du. 
U 


Noticing that x > y implies x#(1+2*)-1 > y*(1+y?)-1, one finds that |t,| can be bounded 
above by 7;,, where 


Tz) = nkU*T]|A,|*-exp (5 50Ra U1 +A2U2)-1\, 
1 1 


and where we have put k = 5 h,. One can hopefully expect that 7, will often be satis- 
1 


factorily small, even for moderate values of U. It does not seem feasible on the other hand 
to obtain an upper bound for the error of integration resulting from the application of a 
standard quadrature formula. With equal step formulae (in the application of which the 
ordinate at the origin is given by (3-3)), a common procedure is then to carry out the integra- 
tion repeatedly, each time halving the step of integration until two consecutive results agree 
to the desired accuracy. In order to gather some more detailed information relative to the 
performance of the two simplest and most commonly used rules, namely, the trapezoidal rule 
and Simpson’s rule, we have preferred to follow a different procedure. Given a tolerance e, 
numerical integration in (3-2) was carried out by using steps of length j/10. The integer j was 
then increased, one unit at a time, until the largest value was attained such that, for the first 
even number of steps yielding a bound 7), < e¢, the values of the integral obtained by the 
trapezoidal rule and by Simpson’s rule, say J, and Jg, still satisfy |J,—Js| < ¢. Results for 
the values e = 0-01 and e = 0-001 are given in Table 1. The following unexpected behaviour 
was observed in all cases: As j increases, J, departs from the correct value of the integral ‘more 
rapidly’ than does Jp. In other words, to achieve a given accuracy, longer steps are per- 
missible with the trapezoidal rule than with Simpson’s rule. Some computations made with 
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a formula (sometimes referred to as Hardy’s rule) based on approximation of the integrand 
with polynomials of degree six seem to indicate that in fact, the higher the degree of the 
approximation polynomials used, the shorter the step of integration must be. Accordingly, 
only the results obtained with the trapezoidal rule are recorded in columns (i) and (ii) of 
Table 1 and are seen to be of much greater accuracy than the value of € would let one 
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Table 1. Probability that the quadratic form Q exceeds x 


(i) Integration in (3-3) with the trapezoidal rule and n steps of length j, for tolerance e = 0-01; 
(ii) same for tolerance ¢€ = 0-001; (iii) exact values; (iv) Pearson’s three-moment y? approximation; 
(v) Patnaik’s two-moment y* approximation. 

e= 0-01 e = 0-001 


A —_ 


Q ee (i) nj (ii) (iii) (iv) (v) 





Q, = 0-6y3+0-3x3+0-1yi O01 16 1-9 0-9478 136 1-0 0-9457 09458 1-0000 0-9184 
0-714 22 -5076 114 12 +5065 -5064 0-4901 -5079 
2 10 30 -1262 86 16 -1240 -1240 -1264 -1310 
Qe = 0-6y3+0-3x3+0-1y; 02 8 1:2 0-9903 24 0-8 0-9930 0-9936 1-0000 0-9867 
2 6 41:9 -4020 20 10 -3997 +3998 0-3961 -4098 
6 8 411 -0159 20 10 -0161 -0161 -0162 -0145 
Q; = 0-678 +0-3y3+0-1y3 1 6 O07 09980 14 0-4 0-9974 0-9973 0-9984 0-9961 
5 4 Il -4354 8 O07 +4353 +4353 -4346 -4400 
12 8 06 -0087 12 0-5 -0088 -0088 -0087 -0080 
Q, = 0-6y3+0-3xi+01¥8 1 8 Il 0-:9663 16 0-7 0-9666 0-9666 0-9767 0-9522 
3 6 14 -4201 10 10 -4195 -4196 -4156 -4330 
8 10 08 -0088 14 O08  -0087 -0087 -0086 -0066 
Q; = 0-756 + 0°3y3.2 2 6 0-3 09938 14 0-2 0-9937 0-9939 0-9928 0-9954 
10 4 O7 -4098 8 04 -4086 -4087 -4089 -4045 
20 6 O83 -0223 10 O38 -0221 -0221 -0221 -0230 
Q, = 0-7yi,6+0-3yi, 2 1 8 O04 09544 48 03 0-9549 0-9549 0-9515 0-9719 
6 6 O07 -4079 36 O4 +4075 -4076 -4084 -3948 
15 8 04 -0224 36 0-4 -0223 -0223 -0223 -0246 
405+ 29% 15 8 12 0-9891 12 0-9 0-9890 0-9891 0-9923 0-9842 
4 4 24 +3463 8 14 +3453 +3453 +3449 -3552 
7 8 412 -0154 12 10 -0154 -0154 0-154 -0131 
4Q,—4Q, -2 6 16 09103 10 1-2 0-9102 0-9102 0-9011 — 
0 4 22 -4052 10 13 +4061 -4061 -4221 — 
25 8 11 -0097 14 O8 -0098 -0097 -0032 — 
405 +42 35 6 0:6 0:9563 10 0-4 0-9563 0-9563 0-9560 0-9605 
8 4 11 -4161 8 O06 -4152 -4152 -4153  -4101 
13 6 O06 -0462 8 O58  -0462 -0462 -0461 -0474 
32;,—49, —-2 6 06 09218 8 0-5 0-9218 0-9218 0-9166¢ — 
2 2 39 -4685 4 14 -4782 -4779 -4810t — 
7 6 O68 -0396 10 04 -0396 -0396 -0387} — 
1(Q3+Q+9s + Qe) 3 10 1:0 0-9842 16 0-7 0-9842 0-9842 0-9840 0-9837 
6 6 20 -4270* 10 12 +4264 -4264 -4264 -4270 
10 12 O09 -0117 14 O8 -O0117 -0117  -0117  -0116 


4(Qs—Qs) + 3(Qe — Qa) ae -9 09861 18 0-7 09861 0-9861 0-9904 — 
20 -5180 12 Il -5170 -5170 +5154 — 
4 12 O9 -0152 16 O8 -0152 -0152 -0139 — 


— 
a vw 
i) 
© 


* Corresponds to first maximum of |J7—Js|. + Wilson-Hilferty approximation. 
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suspect. We have in several cases computed the correction terms, up to the one involving 
finite differences of orders one to six, which are to be added to the trapezoidal rule according 
to Gregory’s formula. They have proved to give no reliable estimate of the error of integration 
resulting from the use of the trapezoidal rule. For ¢ = 0-01, it was found in two cases that 
as j increases, |.J, — Jg| first increases to reach a maximum smaller than 0-01, then decreases 
again. The results corresponding to the value of j for which |.J, — Jg| attains its first maximum 
are then recorded. It is clear that under such circumstances, the procedure we have followed 
is not an adequate one to use in general practice. Another example of this is provided by 
Q = 39; —4Q,, for x = 2. In this case, it was necessary to increase j up to 3-9 before |J,—Jg| 
reached ¢ = 0-01, while two steps of length 1-5 already make J;, < 0-01 and give the better 
value 0-4787. It is hoped, however, that a sufficient variety of quadratic forms is covered so 
that Table 1 can serve as a useful guide for the length of the step of integration to be used in 


any particular case. To help in this purpose, all quadratic forms listed satisfy > |A,| = 1. 
1 


The probability density g(x) of the quadratic form Q could be computed by using a formula 
analogous to (3-2). In fact, 


g(x) = 1 + [p(1)]}? cos 0(u) du. 


Due to the absence of the factor u~! in the integrand, numerical integration, for the same 
accuracy, can be expected to require a slightly larger number of steps than is needed to 
compute the distribution function. 


4. THE ACCURACY OF SOME APPROXIMATIONS 


Recently Johnson (1959) has shown that Pearson’s (1959) three-moment central x? 
approximation to the distribution of non-central x? is remarkably accurate in both tails of 
the distribution. It is therefore natural to extend this approximation to the general case of 
the quadratic form Q of (1-1). Let H(Q) and o(Q) be the mean and standard deviation of Q. 
Following Pearson, we write for a positive quadratic form Q, 


Q = (xv —M’) (2h')-4 o(Q) + LQ), 
and determine h’ so that both members have equal third moments. This amounts to taking 


P(Q > 2} = Pix > y}s (4-1) 
where h’ = c8c8, yy = (wey) (W'feg)} +h’, cy = SAH, +562) (j = 1,2,3). 
1 


If Q is non-positive, the same approximation can be used but one must assume that c, > 0. 
Otherwise, approximate the distribution of —Q. The values obtained from (4-1) are shown 
in column (iv) of Table 1. For the quadratic form $Q,—4Q,, one has h’ = 1395-85. In this 
case, the Wilson—Hilferty approximation was used to evaluate the right-hand member of 
(4:1). The merits of the three-moment approximation are obvious, particularly in the upper 
tail of positive quadratic forms. For such forms, the approximate values given by the 
standard two-moment (Patnaik) approximation are indicated in column (v). It is seen that 
the three-moment approximation, which requires very little more work, gives a much better 
fit than is achieved with the Patnaik approximation. While there is an appreciable loss of 
accuracy for non-positive forms, the approximation still gives useful indications which can 
suffice for certain practical purposes. 
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Most of the computations summarized in Table 1 have been performed on the Ferranti 
‘Mercury’ computer at CERN, Geneva. I am much indebted to the late Director General of 
that organization, Prof. C. J. Bakker, who made it possible for me to use the facilities of the 
computer service. 

I wish to thank the Referee for suggesting some notable improvements in the presentation 
of the paper. 
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Comparative efficiencies of methods of estimating parameters 
in linear autoregressive schemes 


By K. C. CHANDA 
Bombay University 


1. InTRODUCTION 


It is normal practice to estimate the parameters of a linear autoregressive (a.r.) scheme 
of a specified order, say p, by minimizing the sum of squares of the residuals. Asymptotic- 
ally, the resulting estimators are simply related to ine first p-serial correlations in the sample 
and are fully efficient; their properties have been investigated by Mann & Wald (1948). 

What remains, however, a problem still unsolved is to determine whether there exist 
other methods of estimation for which better results are available in short series. In prin- 
ciple one can use information contained in the higher-order serial correlations and it is 
possible that in series of short length such information will prove useful. This possibility 
has been suggested by Ghurye (1950), Kendall (1949) and Quenouille (1957). A first step 
in the study of the properties of the estimators suggested by these authors is to investigate 
their asymptotic efficiency; this investigation forms the subject of this paper. 


2. KENDALL’S METHOD 


Let us assume that the a.r. model is 
Pp 
>» a, X1_» eg Y, (a aaa 1), 
u=0 


where {X,} is the a.r. process and the {¥;} is a purely random process. We assume, further, 
that &(X,) = 0 and V(Y,) = 1. Kendall (1949) has suggested the following normal equations 
for the estimation of a,, (wu = 1, 2,...,p) 


a 
(=< > 62) =0 (“= 1 2; «005 )5 


‘u s=1 du 


p n—|r| 
where 0, = Du Cs—u and (C, = = XX t4ip/(n— Ir|) 
u= = 
and k is an arbitrary positive integer > p. Evidently, the equations reduce to 


: 
y 9,C, 


8=1 


=0 (w=1,2,...,p) (1) 


Ub 


(4, = 2@,,C,_,,) and the estimators are 


a, a Ry ual Ry (u = 1,2, sey P)s 
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where 





k k k 
x OF D> Ts Ts-1 >» Ts 's—p ) 
s=1 s=1 8= 
k k k 
R = p » Vs Ts-1 re-1 > Ts-1' s—p > 
s=1. 8 | 
“ait a 
’ 
>Y Ts" s—p D> r —1's—p >> "s—p 
8= 1 s=1 ) 


where r, = C,/C, and R,; = cofactor of the (i,7)th element in |R| (1 < i,j < p+1). 
If we write n*(@,,—a,,) = 6, it can then be shown that 


8’TT’ ~ —n30’T’, (2) 
where ~ stands for equivalence in probability and 


8’ = (81,45, -.-,4y), 


% Me it ig 
T=[ % 7 + Tre), 
Rises T»-2 ai Tae ; 
0’ = (0,, 0g, ..., 0), 
T, = &(C,). 
Hence 8’ ~ —nt6'T'(TT’)4 
and if A denotes the asymptotic dispersion matrix of 8’ i 
A = (TT’)? TT,T(TT’)-, (3) 
where To = Ty eee Ty 
T,= 1 To + Trg] for all r > 1. 
ee ene 


This follows from the fact that n cov (,,0,) ~ 7,_,. It is known that when a,, are estimated 
by the classical method, the asymptotic dispersion matrix A, for n?#(@,,—a,,) (w = 1, 2,...,p) 


is given by A, = Ts}. (4) 
Hence the generalized efficiency of Kendall’s method is the ratio 
e, = |TT’|?/|T,| |TT,T’|. (5) 
It is not difficult to prove that e, < 1. As a matter of fact, let us write 
B= 6 T;”, 
y =6'T"" 


where 6, = (4;, 4, ...,9,). The dispersion matrix of the vectors &’ and y/ will then be 


ce Pr ee 
TT’T;} xed F 





ated 
oo) 


(4) 


(5) 
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It follows, therefore, that T>1—T>1TT'(TT,T’)“TT’T;' is a semi-positive definite 
matrix and so |T, | > |T,|2|TT’2|TT,T|>, 
ie. &, < 1. Equality holds if k = p. 

For the particular case p = 1, this reduces to 
ey = (1—p™)?/{1 + p?— (2k + 1) p** + (2k—1)p***7} (p = —a,), (6) 


which can be written in the alternative form 





k-2 2k-4 = ne 
am Leto] ECE wy So lf Sart nes 


In particular e, = 1, eg = 1—p?(1—p?)/(1+ 3p?) = (1+ ?)?/(1+ 3p?). 


» 


3. GHURYE’S METHOD 
Let us define A, (q = 0, +1, ..., +p) by 


p p p 
Deus" E x 4,o"% = & Ae (7) 
u= q=—D) 
and let B,= AC« 
-—» 


The normal equations for estimating a,, as suggested by Ghurye (1950) are 


a, 
c 2B), sabia 


k p 
ie. > 2s ned »F Q,Cru» =9 (w=1,2,...,p). 
s=1q=—p 


If we write, as before, 5,, = n4(@,,—a,,) it can then be shown after some simplification that 


for large n 8’GG’ ~ —nip’G’, (8) 
where p’ = (B,, B,, cveg gs 

Jo 91 ++ Gra 

Sa 0 9o en Ix-2 
ie aes ‘i< 
oo p -1 
and > Iws” = ( >> a,é*) (Jo = 1). 
w=0 u=0 


It has, further, been proved by Bartlett & Diananda (1950) that n!B,, n}B,, ..., nt B, have, 
asymptotically, independent normal distributions each with zero mean and unit standard 


deviation. Hence, Wi cds nip’G'(GG’)1 
and if the asymptotic dispersion matrix of 8’ is A, 
= (GG’)! GG’(GG’)* = (GG’)". (9) 
Hence the generalized efficiency is the ratio 
e, = |GG'|/|T,]|. 
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Here again, if we consider the q.f. y’(T,, -GG’) y, where y’ = (y;, ys, ..., ¥p) is. an arbitrary 
vector, we can easily show that the q.f. 


K. C. CHANDA 


= yv¥s¥:y 29, where 5 = (91,92) «++» Js—p): 
s=k+1 


Hence T,, — GG’ is semi-positive definite and |T,,| > |GG’|, ie. e, < 1. 
For the particular case p = 1, 
4 ey, =1—p* (p=—a,). (11) 


Thus both Kendall’s and Ghurye’s methods lead to asymptotically inefficient estimators. 


4. QUENOUILLE’S METHOD 


Quenouille (1957) has suggested a different method. It leads to efficient estimators and 
at the same time it is different from the classical method, in so far as it is an attempt to 
remove the bias of order n- in the least squares estimators, but it must be admitted the 
normal equations are difficult to solve. 

As Quenouille pointed out, the methods of estimation so far employed do not make use 
of the fact that the functions 0,,6,,...,0;, are, asymptotically, related and that the dis- 
persion matrix itself will depend on the parameters of the a.r. process. One, may, therefore, 
consider the asymptotic likelihood function for a,, based on @,’s (Quenouille has used the 
serial correlations r in the definition of 0,’s to include the situations where V (Y,) is unknown). 
The normal equations are then derived by maximizing the logarithm of this likelihood 
function and take the form 


{> 00() +25 Or }+2h(Z)  ¥ 00% 


s,t=1 u 8,t=1 Oa, dys.t=1 
OT ee: 0|T 

- 2in(F2) 4 141/n (ee) fit =O (w= 1,2,...,p), (12) 
where ((7“)) = Tz! and #* is the same as 7 except that a,, have been replaced by @,. When 
terms of order n- only are retained it can easily be seen that the left-hand side expression 
of (12) reduces to a multiple of @,, and hence (12) represent the classical least-squares 
equations. On the other hand, if we retain terms of order n~ as well, this may help towards 
correction for bias of order n-". 

For the particular case p = 1 and k = 2 a slight modification in Quenouille’s estimator 
ieee ay = —ry—2ry(ra—F})*/(1— FP (13) 
The estimator is, asymptotically, efficient and its bias to order n-! is zero. When p =2 
and k = 3, the equations (12) reduce to 

T0118, + 0,05 + @, 03 + O,(1 +47, + Gyry) + O{2,(1 +14) +74(1 + aj —23)} 
+ O3(r'g +r, + @y)] — 20,(1 —@,) [07 + 03 + 24, (8,0, + 0,05) + 24,8, 0, 
+ 83(1 + 43 —43)] + 579 * ,(1—€,)/n = 0, 
799,03 —@,03 + (0, + 8) {(1 +p) ry +4,} + O,(1 + 20,7, + 3 —4})] 
— 2(p}j—@,) [03 + 03 + 24, (9,8, + 0,85) + 24,0,0, + 03(1 + a} — 3) 
+ 679 *(pj—@,)/n — 7) *(p,—G,)/n = 0. 


(14) 
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Solutions to the above equations are, no doubt, difficult to obtain. It can, however, be 
seen that the classical least squares estimators @,, @, will provide first approximations to 
the above solutions and better approximations can be obtained by adding the terms 


2027, 72 —(6r,+@,)/n 
and — 2024,73 + 4@,/n 


to @, and @,, respectively, 6, being obtained by replacing the parameters a,, a, in 0, by their 
estimates @,, @, and 7, = 1/(1 —r) (1 —@3). 

One reason for introducing such a variety of estimators is to examine with what precision 
they estimate the higher-order auto-correlations. It is known that for an a.r. process of 


order p, X,,T,_», 


Pp 
= 0 for s > 1. Consequently the estimators 0, = > @,,C,_,, should have as 
u=0 


small a dispersion determinant as possible for as many values of s as ought to be considered 
(p+1<s<k). In other words if A is the asymptotic dispersion matrix of n#0,,,,, n*0,,,0, 
<iay ni6,., then the relative efficiencies of the different methods we have discussed above 
should be measured by the ratios of the corresponding determinants |A|. Considered from 
this point view we have for the classical least squares method (as well as Quenouille’s 


method) 1 
Ay = T,_»—ToTp!To, 
where .. toa =~ Tea 
T,= Tp-1 Tp oes Th-2 ‘ (15) 
Ty & « he 


For Kendall’s method 


Ax = cofactor of the pth order principal minor in 
(i-T’(TT’)“ T)T,{1-T’(TT’)“T] (16) 
and for Ghurye’s method 


Ag = Typ + T;(GG’) T,- G,G'(GG') T,- T;(GG’)GG;, (17) 


where O 2 O Ge + Tppa 
Gy =|: £43 : ; 


i OM. ad Io 


It may be noted that if we use serial correlations all throughout instead of the serial co- 
variances, the dispersion matrices will in each case be multiplied by 79 2. 
For the particular case p = 1 
Ay _ ((Ays,0)) 


Ax = ((Ays, xc)» 
Mg = ((Ays,a)), 
where Ars,0 = (p""-*!— pr**)/(1 —p?), 
Avs, x = [p"*—prts{(r +s) (1 —p?) + 2 — p*% — p%-28}/(1 — p2*) 
+ prts{l + p®— (2k + 1) p™ + (2k — 1) p***}/(1 —p**)?]/(1 —p?). 
Area = [p""—prt4(1 —p*-* — p**) /(1—p™)]/(1—p*) (p = — a). 
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For k = 2 Ay = 1, 
Ax = 1/(1+p*), 
Ag = (1—p? + p')/{(1 —p%) (1—p}. 


It, of course, remains to consider the terms of order n~! in the expectations and variances 
of the various standardized estimators }(@,,—a,,) we have considered above. This will give 
us some rough idea of the comparative efficiencies of these estimators in small samples. 
The author intends to present the results of this investigation in some future publications. 
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Tables of the Freeman-Tukey transformations for the binomial 
and Poisson distributions* 


By FREDERICK MOSTELLER anp CLEO YOUTZ 


Harvard University 


SUMMARY 


We present a table of the Freeman—Tukey variance stabilizing arc-sine transformation 
for the binomial distribution together with properties of the transformation. Entries in 


the table are id {aresin (_*_\ 5 aresin eS 
~3 (asi — 


where n is the sample size and x is the number of successes observed in a binomial experi- 
ment. Values of 0 are given in degrees, to two decimal places, for m = 1[1]50 and x = O[1]n. 

In addition, for completeness, we give a table of the corresponding square-root trans- 
formation to two decimal places for use with Poisson counts. The observed count is x 
(w = 0[1] 50) and the transformed values are 


g = Jx+.(x+1); 


the squares of the transformed values are also given for use in analysis of variance 
computations. 


1. INTRODUCTION 


Transformations are often used in the analysis of data to improve linearity of regression, 
to improve normality of distribution, and to stabilize approximately the variance, when it 
might otherwise depend strongly upon a parameter. Freeman & Tukey (1950) introduced 
the transformations tabled here to stabilize the variance of binomial and of Poisson counts. 
(A good account of variance-stabilizing transformations is given by Eisenhart (1947).) 
The table may also be helpful in using methods developed by Gupta & Sobel (1958) to select 
a subset of populations better than a standard. 

In using the Freeman—Tukey transformations, we found need for tables to speed the work, 
and they are presented here. The Freeman—Tukey transformation for the binomial number 
of successes x observed in n independent trials is the averaged angular transformation 


9 = 5{aresin /(-£5) +aresin /(=5)}. (1) 


Table 5 gives values of 0 in degrees to two decimals for n = 1[1]50 and x = 0[1]n. When 
0 is measured in degrees, it has variance o? tending to the asymptotic variance 


9 
3. = ne (squared degrees), (2) 


for a substantial range of p if n is not too small. 


* This work was facilitated by a grant from The Ford Foundation and by the Laboratory of Social 
Relations, Harvard University. 
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We tabled the function in degrees rather than radians because we wanted to be able to 
follow from our tabie into the arc-sine table by Bliss (1946) for n’s larger than 50. We did 
not consider making the correspondence with the arc-sine table by Stevens (1953) because 
we had already computed our table before we were aware of his. 

For Poisson counts, the corresponding transformation is 


g = J+ (e+1), (3) 
with variance approximately 1, provided the Poisson parameter exceeds 1. Table 6 gives 


values of g and of g? for = 0[1]50. We found that the table of g? saved a little time in desk 
calculations. 


2. SOME PROPERTIES OF THE TRANSFORMATIONS 


The arc-sine transformation. For selected numbers of trials, n, Table 1 shows, approxi- 
mately, the value of the probability of success on a single trial p (and of 1 —p) at which the 
maximum of 7 occurs, the values of a? at the maximum and of o3,,, and the ratio 03/03... 
When 9g; is plotted against p, as n increases, the positions of the maximum variance move 
toward the extremes, p = 0 and p = 1. 
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Fig. 1. Values of p (< }) for maximum 03 plotted against 1/n, n > 5. 


Table 1. Value of p that maximizes o7 for the arc-sine transformation, 
maximum 03, 63.., and maximum o3/03., 


Maximizing values of 





Sample — OF aes 821 Max o3 
size n Pp l-—p Max o? Oo n+4 o2., 
1 0-5 —_ 506-25 547-33 0-925 
2 0-5 — 374-50 328-40 1-140 
3 0-5 — 267-19 234-57 1-139 
5 0-344 0-656 157-42 149-27 1-055 
10 0-181 0-819 82-08 78-19 1-050 
20 0-097 0-903 42-20 40-05 1-054 
30 0-066 0-934 28-40 26-92 1-055 
50 0-040 0-960 17-17 16-26 1-056 


Poisson a = 1-061 
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The value of p that maximizes the variance is approximately a linear function of 1/n 
for large n, see Fig. 1. This is explained below in the discussion of the square-root trans- 
formation for the Poisson. 

Fig. 2 shows o7/0%.. plotted against p for n = 1,3,5, 10. At n = 5, the curve is beginning 
to assume its characteristic shape, though the ears (the shapes near the left and right 
maxima) are quite flat as yet. For n = 5, the nearly flat portion in the interval 0-3 < p < 0-7 
appears to have a relative minimum at p = }. 
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Fig. 2. Plot of 03/03, against p for n = 1, 3, 5, 10. 


Table 2. Values of o3 and of o3/03.. for p = 4 


Sample % 
size n o3 C35 
1 506-25 0-925 
2 374-50 1-140 
3 267-19 1-139 
4 198-98 1-091 
5 156-02 1-045 
10 76-05 0-973 
20 39-20 0-979 
30 26-51 0-985 
50 16-10 0-990 


One might suppose from the plots of variances of other arc-sine transformations, such as 
given by Eisenhart (1947) and from those of Fig. 2, that a relative minimum for the variance 
would occur at p = } for large values of n. Our impression from calculations is that the 
Freeman—Tukey transformation has a relative maximum at p = } for n = 20 and n = 50, 
where this question was investigated, though the curve of variances is quite flat for a long 
interval about p = 4. Table 2 gives values of oj at p = } for various values of n. 

Table 3 shows both o? and o3,, for n = 50 and for various values of p. The values of p 
are chosen to illustrate the behaviour to the left and right of the high maximum near 
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Table 3. of and o3/03., for n = 50 for various values of p 


Pp % F§/ Fa P o% F3/F 5 
0-01 10-94 0-673 0-10 16-15 0-994 
-02 15-34 0-943 ‘ll 16-09 -990 
03 16-85 1-037 12 16-06 *988 
13 16-04 -986 
0-039 17-17 1-056 +14 16-03 -986 
-040 17-17 1-056 “15 16-02 *985 
-041 17-17 1-056 16 16-02 “985 
0-05 17-05 1-049 0-20 16-04 0-986 
-06 16-81 1-034 +30 16-07 *989 
‘07 16-57 1-019 -40 16-09 -990 
08 16-38 1-008 -50 16-10 -990 
“09 16-24 0-999 


p = 0-040. There appears to be a relative minimum between p = 0-15 and p = 0-20. In 
the interval 0-07 < p < 0-93, of is within 2% of o?,. In the interval 0-02 < p < 0-98, 
o? is within 6 % of 07... 

An alternative arc-sine transformation for the binomial distribution is 


0 = aresin |= (l<a2<n-l), 


, 1 
aresin, | (x = 0), 





O =; 
ng T 

|o0 ~aresin [7 (x =n). 

For n = 50, Fig. 3 relates o3/03., to p and 63/03... to p (where oj... = 821/n). Only the left 

half of the curve is plotted (0 < p < 0-5). The variance of the Freeman—Tukey transforma- 


tion is flatter over a longer region, as of course it is intended to be. Table 4 gives some 
numerical values of the ratios 0}./07... for n = 50. 


Table 4. o} and o}./0}.. for n = 50 for various values of p 


Pp oF FF 

0-1 17-91 1-091 
2 17-11 1-042 
3 16-87 1-027 
4 16-78 1-022 
5 16-76 1-020 


In Fig. 3, the reader will observe that over the long flat region the variance ratio curve 
for 0 falls below 1. If one wished a better match between a} and o3,, over this interval, he 
might use 0}, = 821/(n+1) rather than 821/(n+4) for 10 <n < 50. If he does, then a 
larger percentage error is committed for p’s in the neighbourhood of the ears. 


The square-root transformation. Table 6 gives the square-root transformation for the 
Poisson distribution. If A is the parameter (mean or expectation) of the Poisson, then for 
A = 0, o? = 0, and as A increases, o? increases to a maximum of about 1-061 in the neigh- 
bourhood of A = 2:1. Thereafter 7? decreases, apparently never falling below 1. A graph of 
o% is given by Freeman & Tukey (1950). ForA > 1, 05 is within 6 % ofits asymptotic value 1. 
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When a binomial p is small and n is large, the binomial distribution is approximated by 
the Poisson with parameter A = np. Furthermore, when @ is measured in radians, for small 
values of p, we have arcsin,./p ~ ,/p. Thus the square-root transformation for the Poisson 
and the arc-sine transformation are essentially equivalent in distribution provided that p 
and («+ 1)/(n+1) are small. We can expect therefore, even for large values of n, that the 
ears of plots of the variance against p for the arc-sine transformation do not vanish, but that 
the maximum value of o}/07., is about 6%, and that this maximum occurs near the value 
of p where np = 2-1. For n = 50, we found the maximizing value of p to be 0-040 (Table 1), 
as compared with the Poisson estimate 0-042. This explains the approximately linear rela- 
tion shown in Fig. 1 between the maximizing value of p and 1/n. 
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Fig. 3. Graph relating 03/02, to p and relating 07/0}, to p for n = 50. 


The behaviour of the ears (their persistence) with increasing values of n is reminiscent of 
Gibbs’s phenomenon when a Fourier series is fitted to a discontinuous function. Perhaps the 
analogy is not far-fetched since the transformation is designed to fit a continuous 
function with ordinates y(p) = o3/o3,. and end-points (0,0) and (1,0) to the function 
f(p)=1(0<p <1). 


REFERENCES AND TABLES USED IN CALCULATIONS 
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Table 5. Table of Freeman—Tukey arc-sine transformation for binomial proportions* 


1 1 
The table entries are 6 = — {are sin att +are sin es s 
2 n+l n+1 


where x = number of successes observed, n= sample size. 


n 
pe 1 2 3 4 5 6 7 8 9 10 
x 


22-50 17-63 15-00 13-28 12-05 11-10 10-35 9-74 9-22 8-77 
67-50 45-00 37-50 32-90 29-68 27-26 25-35 23-80 22-50 21-39 
_- 72-37 52-50 45-00 40-13 36-60 33-88 31-69 29-89 28-36 


-—— — — 76°72 60-32 53-40 48-62 45-00 42-12 39-74 
—- a= — — 77-95 62-74 56-12 51-46 47-88 45-00 


8-39 8-05 7°75 7:48 7°24 7-02 6-82 6-63 6-46 6-30 
20°44 19-60 18-85 18-19 17-59 1.-05 16-55 16-10 15-68 15-29 
27-05 25-90 24-89 23-99 23-18 22-45 21-78 21-17 20-61 20-09 
32-63 31-20 29-94 28°83 27-83 26-93 26-11 25-36 24-68 24-04 
37°73 36-01 34-51 33-18 31-99 30-93 29-97 29-09 28-28 27-54 


40-56 38-80 37-25 35°87 34-64 33-54 32-53 31-61 30-76 
47-40 45-00 42-95 41-16 39-59 38-18 36-92 35-78 34-74 33-79 
52-27 49-44 47-05 45-00 43-20 41-62 40-20 38-91 37-75 36-69 
57°37 53-99 61-20 48-84 46-80 45-00 43-41 41-97 40-68 39-50 
62-95 58-80 55°49 52-75 50-41 48-38 46-59 45-00 43-57 . 42-26 


64-10 60-06 56-82 54:13 51-82 49-80 48-03 46-43 45-00 


COI AH RPwWdnee 
~ 
bo 
a 
S 


— 
= 
> 
> 
Hr 
" 


/ 
3 


22 23 24 25 26 27 28 29 30 


6-15 6-02 5-89 5°77 5-65 5°55 5-45 5-35 5-26 5-17 
14-93 14-59 14-28 13-98 13-71 13-44 13-20 12-96 12-74 12-53 
19-61 19-16 18-74 18-35 17-98 17-63 17-30 16-99: 16-70 16-42 
23-46 22-91 22-40 21-92 21:48 21-05 20-66 20-28 19-93 19-59 
26-86 26-22 25-63 25-07 24-55 24-06 23-60 23-17 22-76 22-37 


29-98 29°25 28-58 27-95 27-36 26-81 26-29 25-79 25-33 24-89 
32-91 32-10 31-34 30-64 29-98 29°37 28-79 28-24 27-72 27-24 
35-71 34°81 33-98 33-20 32-47 31-79 31-16 30-55 29-99 29-45 
38-42 37-43 36-51 35-66 34°86 34-12 33-42 32-77 32°15 31-57 
41-08 39-99 38-98 38-05 37°18 36-38 35-62 34-91 34°24 33-61 


10 43-70 42-50 41-41 40-39 39-45 38-58 37-76 36-99 36-27 35-58 
il 46-30 45-00 43-80 42-70 41-68 40-74 39-85 39-03 38-25 37-52 
12 48-92 47-50 46-20 45-00 43-90 42-87 41-92 41-03 40-20 39-42 
13 51-58 50-01 48-59 47-30 46-10 45-00 43-98 43-02 42-13 41-29 
14 54-29 62-57 51-02 49-61 48-32 47-13 46-02 45-00 44-04 43-15 


15 57-09 55-19 53-49 51:95 60-55 49-26 48-08 46-98 45:96 45-00 


* Foran entry z not in the table (n < 50), take the complement with respect to 90° of the entry for 
n—2x, e.g. for n = 16, x = 11, take 90 — 34-64 = 55-36. The entries printed in italics are in fact, comple- 
ments of entries printed higher in that column. 


8 
Ceoernan suns /. 
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20 


33 


4-94 
11-96 
15-66 
18-67 
21-30 
23-69 
25-91 
28-00 
29-99 
31-90 
33-75 
35-56 
37-32 
39-06 
40-77 
42-47 
44-16 
45-84 
47-53 
49-23 


50-94 


Table 5 (cont.) 


35 


4-80 
11-61 
15-21 
18-12 
20-68 
22-99 
25-13 
27-15 
29-06 
30-90 
32-68 
34-41 
36-10 
37-76 
39-39 
41-01 
42-61 
44-20 
45-80 
47-39 


48-99 


45 


36 


4-73 
11-45 
14-99 
17-87 
20-38 
22-66 
24-76 
26-75 
28-63 
30-44 
32-18 
33-88 
35-53 
37-16 


38-75 | 


40-33 
41-90 
43-45 
45-00 
46-55 


48-10 
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4-67 
11-30 
14-79 
17-63 
20-10 
22-34 
24-41 
26-36 
28-22 
29-99 
31-71 
33°37 
34-99 
36-58 
38°15 
39-69 
41-22 
42-74 
44-25 
45-75 


47-26 


38 


4-61 
11-15 
14-60 
17-39 
19-83 
22-04 
24-08 
26-00 
27-82 
29-57 
31-25 
32-88 
34-48 
36-04 
37-57 
39-08 
40-57 
42-06 
43-53 
45-00 


46-47 
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Table 6. Table of values for the Freeman-Tukey square-root transformation* 


The observed count is x and the transformed values are ,/x + ,/(7 + 1). The squares of the transformed 
values are also given for use in analysis of variance computations. 


x Ja+,/(%+1) (Jx+,/(~+1))? x Jx+,/(~+1) (Ja +.4/(%+ 1))* 
0 1-00 1-0000 26 10-30 106-0900 
1 2-41 58081 27 10-49 110-0401 
2 3°15 9-9225 28 10-68 114-0624 
3 3°73 13-9129 29 10-86 117-9396 
4 4-24 17-9776 30 11-04 121-8816 
5 4-69 21-9961 

31 11-22 125-8884 

6 5-10 26-0100 32 11-40 329-9600 

7 5-47 29-9209 33 11-58 134-0964 

8 5°83 33-9889 34 11-75 138-0625 

9 6°16 37°9456 35 11-92 142-0864 
10 6-48 41-9904 

36 12-08 145-9264 

ll 6-78 45-9684 37 12-25 150-0625 

12 7-07 49-9849 38 12-41 154-0081 

13 7°35 54-0225 39 12-57 158-0049 

14 7-61 57-9121 40 12-73 162-0529 

15 7°87 61-9369 

41 12-88 165-8944 

16 8-12 65:9344 42 13-04 170-0416 

17 8-37 70-0569 43 13-19 173-9761 

18 8-60 73-9600 44 13-34 177-:9556 

19 8-83 77-9689 45 13-49 181-9801 

20 9-05 81-9025 

46 13-64 186-0496 

21 9-27 85-9329 47 13-78 189-8884 

22 9-49 90-0601 48 13-93 194-0449 

23 9-69 93-8961 49 14-07 197-9649 

24 9-90 98-0100 50 14-21 201-9241 

25 10-10 102-0100 





* This table was originally published in Gardner Lindzey (editor), Handbook of Social Psychology, 
vol. 1, Addison-Wesley, 1954, p. 327, in a chapter by Frederick Mosteller and Robert R. Rush entitled 
‘Selected Quantitative Techniques’ and is reproduced here with the permission of the Addison- 
Wesley Publishing Company, Inc. 
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A non-null ranking model for a sequence of m alternatives* 


By C. F. CROUSE 
The South African Iron and Steel Industrial Corporation, Pretoria 


1. IntTRODUCTION 


Non-null ranking models, based on the method of paired comparisons, were put forward by Mallows 
(1957). One of these, called the é-model, was extended by Barton, David & Mallows (1958), to serve as 
an interpretation and alternative to randomness in a sequence of two alternatives. 

In this paper we consider a further extension of this ¢-model to the case of a sequence of m alternatives. 


2. THE ¢-MODEL 


m 
We are given an ordered sequence of N = })n; elements, x, ; (j = 1,...,n,; 7 = 1,...,m), and for con- 

i=1 
venience we will refer to x; ; (7 = 1,...,;), as a sample from the ith population. (It is assumed that ties 
do not occur in the ordered sequence.) We are concerned with differences between the populations, and 
hence distinguish N!/(n,!...7,,!) different sequences. It is now assumed that this ordered sequence is 
the outcome of a system of paired comparisons, performed as follows. Each element of every sample is 
compared independently with every element of every other sample. If following these comparisons, the 
resulting sequence contains any inconsistencies of the form x, ; < x;,,, < %,,, < %;,;, it is discarded and 
a further set of comparisons is performed. This procedure may be supposed to be repeated until a con- 
sistent sequence is obtained. As emphasized by Mallows (1957) no experiment would be performed 
in this fashion, this only being a model for situations where the observed data form a consistent sequence. 
It is further assumed that there is no distinction between elements of the same sample so that the 

paired comparisons are made according to the rule 


P{Xj, a> X50} = Dis 
Define 93,5 = Pi,s|(1— 93,5). 
The null hypothesis, that the N!/(n,!...”,,!) different sequences are all equally likely, corresponds to 
¢,:=1, or pps=}h (l<i<j<™m), (1) 
while the ¢-alternative is given by 


¢:,;+1 for at least one i, 7. 


3. THE DISTRIBUTION OF RANKS UNDER THE ¢-ALTERNATIVE 


Let U, ; be the number of timesa x; ,(r = 1,...,”;), precedesa x, ,(s = 1,...,,;), in the ranked sequence. 
Under the ¢-alternative the probability of obtaining a given consistent sequence, wu say, is 


P(ul) «I pyei(l — pj, 3) 
i<j 


= [] d¥%9(1—p, 3)". 
i<j ’ 
It follows that under the g-alternative the set U;;,1<i<j <™m, is sufficient for the set of para- 
meters ¢, ;,1 <i <j <™m, and we thus restrict attention to the distribution of the U, ;. If P(U| 1) is 


* Part of Ph.D. thesis submitted and approved, at the University of London. 
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the probability of obtaining a given set U, say U;, ,, 1 <i <j < _m, under the null hypothesis (1), and 
P(U|¢) the corresponding probability under the ¢-alternative, then it can be shown that 


P(U|¢) « PUD TT OCG? — Pas, 
P(U|1) T3244 


<j 
or P(U|¢) = SPO Tae guys" (2) 


From (2) the moment generating function of the set U is 


Bloxpak ;U;, ) POY IT G5 








M(t ces eS 

“=~ SPM 

i 

_ M(t+é}1) 

M(6|1) | 

‘ 
where t = {ty 2» th, +++ bmi, mb 
= {1,2 by go> tees Sm—1, m} 

and 6,5 = log, d; ;- 


Thus if K(¢|1), and K(¢|¢) are the cumulant genevating functions under the respective hypotheses, then 


K(t+6|¢) = K(t+6|1)—K(4|1). (3) 


Let Kifrs and Kees, , be the respective coefficients of ns 7 ; in K(¢|¢) and K(t|1). Expanding both 


is 
sides of (3) and equating the coefficients of ny ; we obtain 
i<j M4 + 
Kens = i - = >» Peis 
81,227 152 Sm—1ym2Tm—1,m 
(84, 9)8 2-72 Bog ln He 


(81,2—71,2)! (8m-1, m — 1 m-1,m)! 








Crouse (1960) has shown that under the null hypothesis the joint distribution of U, ,/,/var U;, ;, 
1 <i <j <™, is asymptotically normal, and furthermore if there exists € > 0 such that 


l—e> lim n,/(n;+n,;) > € 
N-o 


for all i, j, then under the null hypothesis the moments of U, ,/,/var U;,; (1 <i <j <m), tend to the 
moments of their limiting distribution with O(N-1), and hence as the variance of U, ; under the null 


hypothesis is 
: (ty) nynj(my-+n,+ 1) = O(N), 
Kees = ownt.eft—1) for 28s >3 
w< 


Thus under the ¢-alternative log, $i, 5 = 8,5 = ¢,,/N*, (4) 


(c, gh I-ehs 
qs) nyn,(ny +n; + L)]-*45 Kw = O(N- Ge TT 
i U i) . A : : M Hews ( ns 8m—1, m2>Tm—1, m i<j (84,5—-T1,s)! ! 


for 2 r,; > 3, and as x a*/k! converges for constant a, the above tends to zero when N tends to infinity. 
For the lower order deena it is easily found that under (4) 


&(U 3) = 4nyny t+ N-*D cy, ,00V, (Ui, 5 Un, x) + O(N), (5) 
h<k 


cov (U, ;, Uy, 4) = cov, (U4, 3, On, x) + O(N*), 


1), and 


(2) 


s, then 


(3) 


ig both 


ar Ui» 


| to the 
he null 


(4) 


nfinity. 


(5) 
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where cov, (U; ;, U;,;,), the covariance under the null hypothesis, is equal to 
(ts) nynnytn;+1) when i=h,j =k, 
(zz)ninjn, when i=h+j +k, 
0 when i1+j+h#k. 
Hence: 


TuroreM I. If as N tends to infinity, there exists e > 0 such that 1—e > limn,/(n;+7,;) > €, and the 
limit exists, for all 7, 7, then under the ¢-alternative 


log. $;,5 = ¢;,;/Nt (1<i<j<m), 


the joint distribution of U, ,/,/var, (U; ;), 1 < i<j < m, is asymptotically normal, and apart from a shift 
in the mean the same as under the null hypothesis. 


As shown in unpublished work by Crouse (1960), the rank of the asymptotic distribution of 
U, ;//var,(U,,;) (l<i<j<m), 
under the null hypothesis, and consequently from Theorem 1 also under (4), is (m—1); the rs ‘) 


2 
linearly independent components for which the asymptotic distribution is degenerate being of the form 


N-*T), «5 = Nn Op, ~— 4ngn,) +0(U,, 5— 4nyn,) +n (Uy, ,— ynjn)). 
Under the null hypothesis N-*é(T,, «,3) = 9, 
while from (5) it readily follows that under (4) 
N-6(T,, 4,3) = O(N), 


and the expectation of the degenerate component is thus asymptotically insensitive against (4). Further- 
more, under the null hypothesis 
var T'y, 4,5 = (a's) MAyN,(M, +N, +5), 


thus even the expectation of the degenerate component standardized under the null hypothesis is 
asymptotically insensitive against (4). 

Finally, it can easily be shown that asymptotically the set U, ;/,/var,(Uj,;) (j = 2,...,.m), or any 
similar set, determines the distribution of the set U, ;/,/var,(U;,;), (1 < i<j < m), both under the null 
hypothesis and the alternative (4). 

Hence in the limit the problem of testing the null hypothesis against the alternative (4) reduces to: 
Given a non-singular (m— 1)-dimensional normal distribution with known variance-covariance matrix, 
on the basis of a single observed set of values, it is required to test the null hypothesis that all mean 
values are equal to zero (that is, considering the variables in the form (U, ;— 4n,n;)/,/var, (U;,, ;) against 
the alternative that at least one mean value is different from zero. This is a special case of the linear 
hypothesis model (cf. Lehmann, 1959, p. 304) and the corresponding likelihood ratio test statistic, for 
large samples, is 0’B-0 


where U is the column vector (U,,.— 41%, ..-, Uy,m—4%%m), and B the variance-covariance matrix 
of this set. Under the hypotheses (1) and (4) the latter can be shown to be asymptotically equivalent to 


Uy = (12/(N + 4m) Bs (O,5— 3nj7n;)?/(n4n;). 


Under (1) U,,, is asymptotically distributed as a x? with (m— 1) degrees of freedom, while under (4) as 
a non-central x? with (m— 1) degrees of freedom and non-centrality 


Fen (12/(N + 3m)) Xe U,, 3) — nyny}?/(n4yn;). 
—>o i<j 


4. CONCLUDING REMARKS 


(a) For two samples U,, reduces to Wilcoxon’s test. 
(6) Barton et al. (1958) have mentioned various situations in which the two-sample test can be 
applied, and similarly one can give examples for the m-sample test; for instance: 








Add 


From each of m different nationalities a sample of n; (i = 1,...,m) men is taken; the N = })n; men 
i=1 

all being of the same age. A judge is now requested to arrange the N men in order of increasing age. If 
the judge is unbiased he should produce a random sequence, a significant value of U,, will thus reflect 
that men of some nationalities tend to look older than other nationalities. 

(c) Kruskal (1952) put forward a m-sample extension of Wilcoxon’s test, for testing, among other 
things, the equality of means of m independent samples, and likewise, as will be taken up in a further 
publication, the test based on U,,, can be used for this problem. 
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Critical values of the coefficient of rank correlation for testing 
the hypothesis of independence 


By GERALD J. GLASSER anp ROBERT F. WINTER 
New York University 


A common statistical problem involves the selection of a simple random sample of n bivariate observa- 
tions in order to test the hypothesis that the two variables are statistically independent. As is well- 
known, one non-parameteric test of this hypothesis may be based on Spearman’s coefficient of rank 
correlation, r,, or on an equivalent statistic, D, the sum of the squared differences of the ranks. The 
relationship between r, and D is simply 

6D 


a D = n(n?—1)(1—7,). (1) 


= 

This paper presents approximate critical values of D and r, for one-tailed «-risks of 0-001, 0-005, 
0-010, 0-025, 0-050 and 0-100 with n = 11(1) 30 for use in testing the hypothesis of independence. The 
values are calculated from a Gram-Charlier Type A series approximation to the distribution function 
of r, given by David, Kendall & Stuart (1951). The results apply, strictly speaking, only in the case 
where no tied ranks occur, or where the procedure of breaking ties at random is adopted. 

Under the hypothesis of independence the exact probabilities of values of D are based on their relative 
frequencies in the n! permutations of one ranking against the other. These have been calculated for 
samples of up to size 10 by Olds (1938), Kendall, Kendall & Babington Smith (1939), and David e¢ al. 
(1951). However, the computations required are extensive and one could not hope to have them carried 
much further. 

For samples with n > 10, or at least when n > 25 or 30, it is not uncommon in the literature to find 
a testing procedure based on the asymptotic standardized normality of 


%q = 7,4/(n—1). (2) 


Olds (1949) gives tables for this purpose. Because D may assume only even values, a correction for con- 
tinuity may also be introduced by increasing (decreasing) D by unity for values below (above) its mean. 
An alternative approximation, suggested by Pitman (1937), consists of approximating the sampling 
distribution of r, by a f-distribution, and may be employed by referring to tables of the ¢-distribution. 

This leads to the comparison of a 
uty iz (3) 


with t*, a critical value based on the ¢-distribution with n — 2 degrees of freedom. This procedure gives 
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the test exactly the same form as that used in normal theory for the ordinary product moment coefficient 
of correlation. Tables of critical values based on this approximation has been constructed by Litchfield 
& Wilcoxon (1955) and by Teegarden (1960). A further correction for continuity in computing 7, for ty 
involves dividing D by 4n(n?— 1)+ 1 to match the range of the /-distribution. 

More elaborate and probably more exact methods of approximating the sampling distribution of 
r, or D, have been given by David et al. (1951). The authors give the moments and the cumulants of 
r, up to those of the eighth order and use these results in developing the leading terms of an expansion 
of a distribution function based on Edgeworth’s form of the Gram-—Charlier Type A series. These results 
appear to enable one to calculate, with somewhat greater accuracy than either the normal or £-approxi- 
mations, probabilities of exceeding given values of D or r, under the hypothesis of independence. 

The superiority of the Type A series approximation is pointed up by the results given in Table 1. 
The table shows several values of P{D < D*} when n = 10 and estimates thereof, based on the three 
approximations mentioned, for values of D* at probability levels of about 0-001, 0-005, 0-010, 0-025, 
0-050, and 0-100. The calculations of the standardized values involved here are straightforward (correc- 
tions for the discontinuity of D were employed) and reference is made to the appropriate tables to 
determine corresponding probabilities. As might be expected, the normal approximation shows up 
badly at all but the 0-05 level, and the # appears superior. The estimates based on the Type A series are 
consistently better than either of the other two, although at some levels the £-approximation is almost 
as good. 


Table 1. Exact and estimated values of P{D < D*} forn = 10 








Probabilities Errors 

"ie — A i c an Fs Sie 
D* Normal Beta Type A Exact Normal Beta Type A 
22 0:00442 0-00059 0-00098 0-00080 + 0-00362 — 0-00021 +0-00018 
24 00491 00082 00134 09109 + -00382 — -00027 + -00025 
36 -00904 00371 00494 00439 + -00465 — :00068 + -00055 
38 “00999 00456 -00587 00527 + -00472 — -00071 + -00060 
44 01328 -00776 00926 00870 + -00458 — 00094 + -00056 
46 -01456 -00912 01095 01012 + -00444 — -00100 + -00051 
60 -02699 02346 -02479 02449 + -00250 — :00103 + -00019 
62 -02931 02639 -02744 02722 + -00209 — -00083 + -00008 
74 -04716 -04829 04788 04814 — -00098 + 00015 — -00020 
76 -05092 -05274 05218 -05244 — -00152 + -00030 — -00017 
92 08915 09838 -09510 09526 — -00647 + -00276 — -00024 
94 -09527 *10527 *10212 10222 — -00695 + :00305 — -000i0 


The Type A series approximation, despite its apparent excellence, is unfortunately complex and 
perhaps too burdensome to use in many applications. This paper presents results designed to alleviate 
this difficulty. The approximation has been used for n = 11 (1) 30 to calculate probabilities for values 
of D, in the neighbourhood of widely used probability levels. The computations followed along the lines 
suggested in the paper by David and others, except that Sheppard’s corrections for the moments were 
not introduced (the effect would have been negligible). A correction for discontinuity, however, was 
employed by deducting unity from each value of D. Several checks for consistency were made. 

In computing probabilities, standardized deviates were computed to three decimal places (e.g. 1-649) 
and probabilities for these deviates to five places (e.g. 0-00486). A more extended calculation might set 
a few of the critical values given below 2 units higher, or lower, but in no case more. Copies of tables 
showing the approximated probabilities are available from the authors on request. 

From the results, critical values of D have been selected such that, according to the approximation, 

- 2 * j ring 
each is the largest value of D* satisfying P{D < D¥) <a. (4) 


Table 2 summarizes these values of D* for n > 10 together with exact critical values for n < 10. Upper- 
tail critical values of D are obtained from the fact that the distribution is symmetrical on the range 
(0, fn(n*—1)). 
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A comparison of the values of D* shown in Table 2 with similar values based on the normal and the 
£-distributions indicates systematic and, sometimes, substantial differences. 

In borderline cases the choice of a table of critical values may determine whether or not the hypothesis 
of independence is accepted. For example, in his basic statistics textbook, McCarthy (1957) provides 
an interesting illustration of rank correlation in testing the hypothesis that the success of mediation 
and the amount of hostility in early labour-management negotiations are independent against the one- 
tailed alternative that the variables are inversely related. In the illustration n = 12 and the resulis 
show D = 454. Application of the Olds (1949) table based on the normal approximation leads the 
author to suggest accepting the hypothesis of independence, for « = 0-025. Table 2, however, shows 
that the lower 2-5 % level for D is 120, so that the upper 2-5 % level is at 572 — 120 = 452. The hypothesis 
would therefore be rejected at this level. 


Table 2. Approximate lower-tail critical values, D*, where P{D < D*} < a,n = 4(1)30 


Significance level, « 


n 0-001 0-005 0-010 0-025 0-050 0-100 
4 — _ oe —_ 2 2 
5 -- — 2 2 4 6 
6 — 2 4 6 8 14 
7 2 6 8 14 18 26 
8 6 12 16 26 34 44 
9 12 22 28 38 50 64 

10 22 36 44 60 74 92 

11 34 54 66 86 104 128 

12 52 78 94 120 144 172 

13 76 110 130 162 190 226 

14 106 148 172 212 246 290 

15 142 194 224 270 312 364 

16 186 250 284 340 390 450 

17 238 314 356 420 480 550 

18 300 390 438 512 582 664 

19 372 476 532 618 696 790 

20 454 574 638 738 826 934 
21 546 686 758 870 972 1092 
22 652 810 892 1020 1134 1270 
23 772 950 1042 1184 1312 1464 
24 904 1104 1208 1366 1510 1678 
25 1050 1274 1390 1566 1726 1912 

26 1212 1462 1590 1786 1960 2168 

27 1390 1666 1808 2024 2216 2444 

28 1586 1890 2046 2284 2494 2744 

29 1800 2134 2306 2564 2796 3068 

30 2032 2398 2584 2868 3120 3416 


Note. For the corresponding upper-tail critical values, take 4n(n*—1)—D*. 


Table 3 presents critical values of the coefficient of rank correlation r*, derived simply from the D* 
values in Table 2 by using (1). Lower-tail critical values of r, are, of course, simply —r¥. 

Table 4 expresses several of the critical values from Tables 2 and 3 in standard units. These figures 
may be taken to provides some ideas as to the usefulness of the normal approximation outside the range 
of the table. They indicate, for example, that at the 0-025, 0-050, and 0-100 levels a normal-based test 
would be almost as accurate as the Type A series approximation when n > 30. For smaller levels it 
would be, supposedly, somewhat less accurate, and the t-test might be applied. 
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Table 3. Approximate upper-tail critical values, r¥, where P{r, > r¥} < a, n = 4(1)30 


Davin, 8S. T., Kenpatz, M. G. & Stuart, A. (1951). Some questions of distribution in the theory of 


Significance level, a 


0-001 0-005 0-010 0-025 0-050 
=r = — — 0-8000 
— — 0-9000 0-9000 -8000 
— 0-9429 0-8857 0-8286 0-7714 

0-9643 “8929 “8571 -7450 -6786 

-9286 *8571 *8095 -6905 +5952 

-9000 *8167 -7667 -6833 -5833 

*8667 -7818 +7333 -6364 -5515 

0-8455 0-7545 0-7000 0-6091 0-5273 

-8182 *7273 -6713 -5804 -4965 

*7912 -6978 -6429 +5549 -4780 

-7670 *6747 *6220 -5341 -4593 

-7464 -6536 -6000 -5179 -4429 

0-7265 0-6324 0-5824 0-5000 0-4265 

-7083 -6152 -5637 -4853 -4118 

-6904 -5975 -5480 -4716 +3994 

+6737 -5825 -5333 -4579 +3895 

-6586 -5684 +5203 -4451 -3789 

0-6455 0-5545 0-5078 0-4351 0-3688 

-6318 +5426 -4963 -4241 *3597 

-6186 +5306 *4852 *4150 -3518 

-6070 +5200 -4748 -4061 +3435 

+5962 -5100 -4654 *3977 +3362 

0-5856 0-5002 0-4564 0-3894 0-3299 
*5757 -4915 -4481 +3822 +3236 
-5660 +4828 -4401 *3749 *3175 
-5567 *4744 -4320 -3685 +3113 
*5479 +4665 -4251 +3620 +3059 


Note. The corresponding lower-tail critical value for r, is —r*. 


Table 4. Values of r* in standard deviation units, n = 10(5) 30 


Significance level, « 


0-001 0-005 0-010 0-025 0-050 
2-600 2-345 2-200 1-909 1-655 
2-793 2-445 2-245 1-938 1-657 
2-871 2-478 2-268 1-940 1-652 
2-921 2-498 2-280 1-948 1-647 
2-951 2-512 2-289 1-949 1-647 
3-090 2-576 2-326 1-960 1-645 
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The bias of the maximum likelihood estimates of the location and scale 
parameters given a type II censored normal sample 


By J. G. SAW 
University College London 


1. INTRODUCTION 


Gupta (1952) discussed the solution of the pair of equations in f and @, the maximum likelihood 
estimates of the normal population parameters of location and scale, and gave the leading term for 
var (ji) and var (@) when it is supposed that only the smallest r of n observations from the parent popula- 
tion are available. The solution of the equations in # and @ is tiresome and some attempts have been made 
to facilitate computations by using a fairly simple, systematic technique; see, for example, Cohen 
(1957). 

It has been convenient to assume that for samples of size n > 20 the bias of the likelihood estimates 
will be small enough as to warrant no consideration. The purpose of this note is to find the leading term 
(i.e. the term of order 1/n) in the bias of f and of G, first, to comment on the validity of the assumption, 
and secondly, so that by suitable adjustment the bias may be reduced to order 1/n? in each case. 


2. THE LIKELIHOOD EQUATIONS 


Let 2, < 2, < ... < x, be the smallest 7 observations from a sample of size n from a normal population 
with mean y and standard deviation a7. We will use 





1 u 
z = —bu? =l]-— = . 
(u) Jen)° » P(u) =1-—Q(u) [0 dt. (2:1) 
Then with this notation, 
n! : 
seaplane BES: -—p)/o)\"-" — yp)! y 
Plas Bp ---9%y) = Pla /] TD ee #)/e) (2-2) 
so that the likelihood equations are 
2 A 2((a,— ft) o) 
A (x wae )+(n—r) PN 0, 2-34 
o ~ tia Q((x,— f)/e) alate 
Tr? P 2((a,— fi) o) (x, — fi) 
Kz Dy (Xp — fh)? + (m—7) aa 2-36 
er . Q((%,-f) 6) — 
Write X= h+on, (2-4) 


then v, < v, <... <v, are the r smallest ranked observations of a set of n observations from the 
standard normal population. We define 


% = rilrs (2-5) S,= 2 (v8), (2-6) S, = (t,—7,), (2:7) 


0=</e, (2-8) U, = (x, —pi)/G, (2-9) 
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a; by &(S8;) =a,+O(1/n) (¢= 1,2), (2-10) 
X, by P(X,) =p, =r/(n+1), Zz by 2 =2(X,), (2°11) 
2(x) 
d/dx)t 2 2-12 
$= (d/dey orl (2-12) 
Ss. 0 
Y(0,8;,S,) = = als +8, |- 5, -*" (2-13) 
ee n—r 2 oy ‘ 

G(9,S,,S_) = fom at - tt Y*0, S,,S.). (2-14) 

Now it may be shown (Saw, 1958) that 
a= 1-=| x, +=], @, = X, os (2-15) 
so that Y(1,a,,a,) = 0, (2-16) 
G(1, a,,4,) = —z,/(n+ 1) p,q. (2-17) 
If we put U, = X,+ Y(u,), (2-18) 


then after expanding 2(u,)/Q(u,) about z(X,)/Q(X,). we have using (2°3a) 
1 =) n—r 2 d: rt es = 
gm tu+"—" yy HV) = 0. (2-19) 


Eliminating the factor 2((#,—7)/%)/Q((a,— jfi)/o) between equations (2-3a) and (2-36), we arrive after 
substitution at 


S, 0 
ienees 2-20 
«= a(t) - 5 gs 
S _ 
so that Y(u,) = u,—X, = 6 Let +5, | wo % vw 
= ¥(0,S,,8,). (2°21) 


Using (2-20) and (2-21), we rewrite equation (2-19) as 


8S, 89 n-r2 hy, 
a »S.) = 0, 2-22 
OS, Sy 7 tol! ¥\(0,51 82) ( ) 


or Ga, Si, So) = 0. (2-23) 





Using an inverse Taylor expansion for 0 about 0 = 1, we have 
co or? _— Gg k 
k=1LOG* k! Jla,s,s,) 


In this last equation, 0 is to be considered a function of G defined by G(0,S,,S,) = 0. 
Expanding the right-hand side of equation (2-24) for S, and S, about a, and a, and taking expectations 
we have finally 








(2-24) 





«(Z-1) = Bhs Tacw (2-25) 
a \t iak9 (-G r 
a hes = . (2-26) 
“i ie > (=) (se) a8 aG* a1gtk! Jaa, as) 
LD, 5 = €(S,—a,)* (S_—a2)9 (2-27) 


and it may be shown that L;,,; is of order [}(i+7 + 1)] in 1/n, where [w] is the greatest integer which does 

not exceed w. 
Now by (2:4) and (2-9 
w by ( ) an ( ) (2—p)/o = v,—Ou,. (2-28) 


Using (2-20) we have after some arrangement 








Sy-1  (O=1) | {O-1) 


(i—w)/o = v,-S,-—2— + (2-29) 
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3. Moments or S; AND S, 


In order to determine the coefficient of 1/(n +1) in &(j) and in &(G) it is necessary to know the first 
two moments and product-moment of S, and S, to order 1/(n + 1). The technique involved in evaluating 
these moments in terms of a power series in 1/(n + 1), the coefficients of which vary with p, = r/(n +1) 
only, has been described by David & Johnson (1954) and by the author (1958). The required results 


are quoted here. If 





1/d\ 
saa i! (aaa) 7 ene a te <p 
1 d \* 2x) 
estes d, = 2/0, 
a= F (ari) Pi)lenx, “™ 





then 
1 

&(S,— a) (n+ 1) = — p,9,(Codg + Cyd, + Cydq +d} + 2dgd,) +3 [3(¢9 + dy)? — 
T 


1 
&(S,—;)? (n+ 1) = p,9-(Cod, +, dg + Me += [—2+ 10d? + llegd, + 2c2 — 10dé — 22d3 cy 
r 


— 15¢2d? — 3dyc3 — 6(Cy + dy)? (1 — Cody —d2) + 4(1 + Cg dy +02) (1 — Cg dy— 
6(8,—a4) (n+ 1) = Ptlea ty) ~~ ley tay] + O(n) 
&(S_q—aq)*(n+1) = p,9,(¢, +44)? +. = “ — 202 — Bey dy — 3d? + 2(cy +dy)*] + O(n)-, 
&(S; —a,)(S,—a_)(n+1) = ea aa + 2dyd,) +7 ¢ dy+ 2d) + 3ce,+c3 
— 5(¢9 + dy) (1 — cody — 
In addition we have that ¢(S,—a,)‘(S,—a,)/ is of order [4(¢+7+1)] in 1/(n+1). 


4, Bras oro 


2(1 + ¢yd9+ c)] +0O(n)-, 


q)]+ O(n), 


dT) — 3(Co sage 24+ dy) (1+ cody +c5)] + O(n). 





(3-1la) 


(3-16) 


(3-2) 


(3-3) 


(3-4) 


(3-5) 


(3-6) 


G(1, a,, a) is itself of order 1/(n+ 1). Using this, we evaluate the coefficients h, , ; as follows (each term 


in the expressions is to be evaluated at the point (1, a,, a@)) 





20 ag a a@ 20 
= -—-G— =-——— Pa =— ae 
ho:0 Ga’ hi:0 ~ 3S, qt On n+1), hoz aS, aq tOlnt+h, 
5-1 200 _, 2 os (22)? 297, omar) 
2:0 = 9] ~ ast a@ aS, aGaS, ss aga | FO +d" 
If #G20 a 30 2 30 
hecZal -—— —-—3— —— 1)-1, 
0:2 Al 983 0G aS, aGaS,* (=) | aaa 
eG 2 a 80 a 80 aG a 20 
ee hee +0(n+1)- 
a8, 0S, 0G aS, 8G0S, dS, GOS, * aS, aS, oG 


Evaluating these leading terms in h,,; and L;,; we arrive, using equation (2-25), at Table 1 which gives 


the value of B(é: p,) in the expression 


&() = o+ 


rm warn Be Pr) +O(n+1)-* 


for p, = 0-20 (0-05) 0-80. 


Table 1. Values of B(S:p,), in &(6) = e+ BS: p,)+O(n+1)-2 
Pr Be: Pr) Pr Be: Pr) 
0-20 4-777 0-55 — 1-582 
125 —3-751 60 — 1-431 
30 — 3-087 65 — 1-303 
35 ~2-614 70 — 1-193 
-40 — 2-258 15 — 1-098 
45 — 1-983 80 ~1-013 


— 1-762 — 0-750 





(41) 


> first 
ating 
n+1) 
sults 


3:1a) 


3-10) 


(3-2) 


(3-3) 


(3-4) 


(3°5) 


(3-6) 


1 gives 


(4:1) 
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5. Bras or ji 
From (2-29), since 














xt X, Pr Ur - 
€(v,) +e 1) ., (5:1) 
S,-1 a,-—1 a,-l 1 
é Ss = = - a (@gLo:1—Lo:2) + 35 (4224 9 Ly 1) + (n+ 1), (5-2) 
0 
6(-1)1/8, =~ som4iy, (5-3) 
2 
B(G:p,) 1 
£9 = 1) [Sy = PF (yo La sa +g Dg .a) +O( +1), (5-4) 
2 2 


we may construct Table 2 which gives the value of B(ji: p,) in the expression 


oC 


é —_ fis cide 5. -2, if 
(#) b+ Be Pr) + O(n +1) (5-5) 
Values of var (9) = var (@/c) can be obtained from Gupta’s tables (1952) or from 
var (9) = hit. 9 Le.9+2hy oho :1 211 +43 1 Lo :2+ O(n +1). (5°6) 


Table 2. Values of B(ji:p,), in &(ft) = p+ Bi: p,) +O(n +1)? 


Pr BR: Pr) Pr Bi 2Pr) 
0-20 — 5-538 0-55 — 0-750 
+25 — 3-792 -60 — +583 
-30 — 2-766 65 — -450 
+35 — 2-078 -70 — +342 
+40 — 1-588 *75 — +254 
*45 — 1-232 -80 — -18l 
-50 — 0-960 1-00 — -000 


6. SuMMARY 


The tables of B(f :p,) and Be :_p,) indicate that the bias in small samples may be considerable when 
p, = r/(n+1) is small. For example, for the case n = 19, r = 7; the bias in the estimate for o may be 
as high as 13 % and in the case n = 19, r = 4; the bias may be as high as 24%. In order to comment 
more precisely it would be necessary to obtain some higher terms in the power series expansion in 1/(n + 1) 
for the bias of # and of @; some additional light could perhaps be thrown by a sampling experiment 
but in any event it would seem advisable to construct the corrected estimates: 





A 
A o A 
= f— ——Bk:p,), 
He = (n+1) (4: Pr) 
a Bie 
C, *-Sa 


in order to reduce the bias to order 1/(n + 1)?. 


The research reported here was supported in part by the Department of United States Army and was 
carried out at the University of North Carolina, Department of Bio-Statistics. 
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On the solution of the likelihood equation by iteration processes 






By B. K. KALE 
Department of Mathematics and Statistics, University of Poona, India 











1. The exact determination of the maximum likelihood estimate (MLE) is often quite difficult. In 
such cases MLE is approximated by using some iteration process, the usual being the method of scoring 
for parameters as given by Rao (1952, pp. 165-7). The idea is essentially due to Fisher (1925). 

If 7, is the trial solution of the likelihood equation, 7’, the rth iterate and 8 the MLE, then the use of 
an iteration process is justified if the error |7',—9| decreases with increased iterations and tends to zero 
as r tends to infinity. As 7’, and @ are both random variables, we have to investigate the probability with 
which the above properties are realized in an application of an iteration process. The use of the iteration 
process is justified in large samples at least, if the above probability tends to one as n, the sample size, , 
tends to infinity. 

In this paper we consider some iteration processes, and the sufficient conditions under which they 
have the desirable properties. We afterwards investigate the probability with which the sufficient 
conditions are satisfied. 

In effect, it is shown that the iteration processes usually applied in practice are justifiable, in large 
samples at least. 













































2. Let f(z,@) be the probability density function which satisfies the regularity conditions given by 
Cramér (1954), and whose range does not depend upon the parameter. In this paper we will consider the | 
single-parameter case only; the multiparametric case will be considered in a separate paper. 

Under the regularity conditions, the following well-known results have already been proved by 
Cramér (1954) and Huzurbazar (1948). 

R.1. With probability approaching certainty as n > oo, the likelihood equation admits a consistent 
solution (Cramér). } 

R.2. (1/n) (d* log L/d6*)g_, converges in probability to —I(0) as n > co (Huzurbazar). 

R.3. The consistent solution of the likelihood equation is unique and P{(d*log L/d6?),_, <0] > 1 
as n > co (Huzurbazar). 

The following lemma will be used in the sequel: 


Lemma. If Y and Z are random variables and a and b are positive constants, 
P[|¥+Z| >a+6] < P[|¥| >a]+P[|Z| = 
This result has been given by Frechét (1950, chapter 1). ? 


3. Let (0) be a differentiable function of 0, which has no zero in a neighbourhood of 6, the root of | 
likelihood equation which we assume to exist. 





Define dlogL 
$(0) = 0-0) —. (1) 
Consider the iteration process Tr41 = ([P() lor 
dlog L 
T= T,- wry (SF S ) (2) 
0=Tr 


Let e, = |7',—6| be the error at the rth iteration, then the choice of y(0) is to be made in such a way that | 
Cr41 < &, and e, > 0 as r - 00, so that process (2) converges to 8. 

Householder (1953, pp. 118-22) has shown that the following conditions are sufficient to ensure that 
Cr41 < e, and that e, > 0. 

(A) There exists a p neighbourhood of 6, N (9), such that if 6’ and 6” « N (9) we have for some k > 0 
lel < <k, where 0 <k < 1. If further ¢(9) is differentiable, |¢’ (6) < 1 ensures the existence 
of N,(7) 

(B) The initial solution 7’, « N (9). ! 

We will consider different iteration processes corresponding to the different choices of (0), and then 
investigate with what probability the conditions (A) and (B) are satisfied. If, at least, we can prove that 
conditions (A) and (B) are satisfied with probability approaching certainty as n + oo, the use of the 
iteration process is justified in large samples. 
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4. Consider first the Newton—Raphson process, where (0) = [d? log L/d@?]-, As by R.3 


@ log L 
At Ss 
( TD ),-9<o|>? as n>, 


and [d* log L/d6*]-* is differentiable for 0, y(@) has no zero in some neighbourhood of 8 with probability 
hi it; : 
approaching unity as n — oo dlog L /d*log L 
dé dz ’ 

therefore ¢’(9) = 0, on the assumptions that (d log L/d6),_3 = 0. By R.1 6, exists with probability 
approaching unity as n > o. As (9) = 0 implies |¢’(9)| < 1, the probability that condition (A) is 
satisfied, tends to unity as n > oo. 

As for condition (B), we will choose 7’, to be a consistent estimate; then as 7’, and 6 are both consistent. 


$(9) = 0 








P{|T,-4| >4p]>0 as n>, (3) 
and P[\O-0,|>4p]>0 as n>. (4) 
Now T,—8 = (T,—@)+(0.—6). 


Hence applying Frechet’s lemma 
P(|T,—9| > p] < P{|T,—9%| > 491+ Pl| —9|9 > 3p). 
By (3) and (4) it follows that P[|T,—-9| >p])>0 as n>o. 
Thus P[T,€N,(@)] +1 as n>o, (5) 


when 7’, is consistent. This shows that condition (B) is also satisfied with probability approaching unity 
as N -> 00. 

In all the other processes also, the initial solution will be chosen to be consistent, and hence by (5) 
the probability, that condition (B) is satisfied, approaches certainty as n > oo. This result we will take 
as proved one in other processes and we will concentrate attention only on condition (A). 


5. Now consider the method of scoring for parameters. Here 

* d@logf 
de? 

Now [J(9)]- has no zero in the neighbourhood of 8. We further assume that I(6) is differentiable. We have 


” dlog L 
$(0) = 04998 /nt0), 





WO) =--T5, where 10) = B( and 0<I1(0@) <o. 














@log L dlogL 
, = —_—_—— = 0 
therefore ¢’(8) 1+[ ap [ut], as ( 740 e 
The condition |¢’()| < 1 is equivalent to 
1 /d*log L 
—21(8) < - 0. 6 
(0) <5 ( dé? ), * (6) 
B ; 3 
Acie _ <O/>1 as n>o@; 
n\ d@ ]o=8 


hence the right-hand part of the inequality (6) is satisfied with probability approaching unity as n > oo. 

As for the other part, consider (1/n) (d? log L/d0*)y. 9 + 21(8). From R. 2 (1/n) (d* log L/d6*)g_ con- 
verges in probability to —1(0,) as n > 00. As 1(0) is differentiable, I(@) is also a continuous function of 6. 
Further 8 converges in probability to 0, as n > oo. Hence I(8) converges in probability to I(99) as n > oo. 
This follows from the result given by Frechet (1950), that if x, converges in probability to x then if g 
is a continuous function g(x,) converges in probability to g(x). Thus (1/n) (d*log L/d6*)g_p + 21 (6) 
converges in probability to I(@,) which is positive. Therefore 


1 /d?logL 
A ee 


1 /dlogL | 
Pl - 2108) <> ( 758 ),»|>? as n-—> 00. 


This shows that left-hand part of inequality (6) is also satisfied with probability approaching certainty 
as n -> oo. Thus the condition (A) is satisfied with probability approaching unity as n > oo. 





(7) 
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6. The choice y(@) = —k/n for a suitable choice of constant k is very convenient as the numerical 
calculations are reduced. For this choice 
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k dlogL k (dlog L 
0 — 0 ——_— 4 = - | —-— 
9(9) so do and 9'(6) ‘+7 ( dg? Ss 
and the condition equivalent to (A) is 
k (d?log L ) 
-2< *( ae? ),-»< (8) 


Choose k > 0 then by R.3 





k (dlog L ) 
{= ( 7 ),-9<0}>? as n>. 


Now (k/n) (d? log L/d6*),_3 +2 converges in probability to —kI(0))+2 as n > oo. If 1(,) is known we 
can choose k > 0 so that —kI(0))+2 > 0so that inequality (8) is satisfied with probability approaching 
unity asm > oo. For example, in the case of Cauchy’s distribution given by 1/[7{1 + (a —@)*}], I(@)) = 480 
that any k from (0,4) can be chosen. The best choice is k = 2 for which ¢’(@) converges in probability 
to zero, and it is known that for such a choice of ¢(@), the process is rapidly convergent. } 
Such a choice is possible if I(@) is constant, which is very rarely the case. As suggested by Huzurbazar 
(1955), we may first try the parameter transformation 0 to a which will preserve regularity conditions 
and will make J(@) independent of the parameter, and then apply the process with suitable choice of k. 


7. Define the iteration process as follows, 


Bs a, (dlog L 
Trs1 = 1+ ( do o=T. (9) 





where a, is a sequence of real numbers to be suitably chosen, such that e,,, < e, and e, > 0 asr > ©. 
Hildebrand (1956, pp. 443-50) has shown that for suitable choice of a, the following conditions are 
sufficient ) 


2 
(A) -2<2( in) <0 for allr>nr, 
6 


n dg? 


(B) T,¢N,(). , 


For fixed r, (a,/n) (d*log L/d0@*),_, converges in probability to —I(9)a,. Therefore, if we choose 
a, > 0 and such that a, + 0 as r > 00, then as d* log L/d@? is almost everywhere finite, there exists ry 
such that 


-2< 





a, (d* log L 
n dg? 
and as a, > 0 we have | 


d?| 
¥F ofan = <0 forallr>r,|>1l as n>o. 
n\ d&®® Jong 


) for all r > 79, ) 
6=0 





Thus the probability that conditions (A) is satisfied approaches certainty as n > oo. Again for 7’, we 
use @ consistent estimate so that the condition (B) is also satisfied with probability approaching cer- } 
tainty as n > oo. 


8. Ifa distribution admits a sufficient statistic, the likelihood equation has the form : 


(0) = > u(z,), ie. O= O-! > ue) |. (10) 
i i=1 


= 


We may then apply any of the iteration processes considered above to solve (10). If the iteration process 
is applied, the condition (A) is satisfied with certainty even for finite n, but the condition (B) is satisfied J 
with probability approaching certainty as n > oo, and the results are not free from probability. 
Instead we may use inverse interpolation methods as follows. We calculate several values ®(0,), 
(9,), ..-» ©(8,), where (0,,0,,) encloses 8, and then interpolate by using a suitable formula for the 


n 
observed value >) u(z;). No considerations of probability will arise in such a procedure. The methods of 
i=1 
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inverse interpolation can, however, be only applied conveniently, when the likelihood equation has form 
(10), in which case it can be easily seen that the distribution admits a sufficient statistic. 
For example, consider the estimation of A where 


f(x, aA) = -eg7A (x>0,A+1>0). 


1 
Ta+l)° 
Here @(A) = no [log I(A+1)] and w(x) = logz. 


®- is very complicated and an explicit solution for a is not possible. We may use Pairman’s (1919) 


n 
Tables of Digamma Function to tabulate ®(A) for various values of A and then interpolate for }} u(z;). 
i=1 


9. By way of comparison of the iterative processes, only the Newton—Raphson process is of second 
order, while all the others are of first order. The Newton—Raphson process is accordingly rapidly con- 
vergent. However, this process usually involves more cumbersome calculations than any other. The 
method of scoring for parameters can be applied if I(@) is differentiable for 0, which is usually the case. The 
choice of (0) to be a suitable constant is very rarely possible. As for the selection of suitable sequence 
a, in the process given by (9), we must have a, > 0 as r > co and a, > 0. However, in a given problem 
the choice of a, is not as arbitrary as it might appear, if the method is to be practically useful. It istosome 
extent governed by the behaviour of dlog L/d0, as it should be. We should choose a, so that the process 
does not converge too slowly. A simple choice of k/r or k/r® is often quite convenient. 

To compare the methods we will consider the following example given by Fisher (1954, pp. 299-320), 
based on Carver’s data, for two factors in maize, starchy v.sugary and green v. white. The problem is to 
evaluate the maximum likelihood estimate of 6, when the probability of belonging to four classes, 








respectively, is 1(2+0), 41-0), 11-0), 30. 
Table 1 
Starchy Sugary 
Green White Green White Total 
1997 906 904 32 3839 


As shown by Fisher, quite a few consistent estimates are available. We use 7’, = (1/n)(a—b—c+d), 
where a, b, c, d are the observed frequencies of the respective classes and n is the total number of cases. 
Thus 7’, = 0-057046. Fisher has shown that the likelihood equation is quadratic and has two roots, 
one positive and one negative. The positive root is the required value of 6, namely 0-035712. As the exact 
value of 6 was available, the comparison of the methods was easier. 

10. The following table shows the successive iterates obtained by three methods: 

(I) Newton—Raphson process; 
(II) method of scoring for parameters; 

(III) the choice of an arbitrary sequence a,. 

The choice of a, = 1/(5r) in (III) was based on the observation that (1/n) (dlog L/d0)q.7, = —0-100997 
and that the likelihood equation suggested that 9 < T,, but 9 > 0-03. 

Four iterations are calculated. The entry ‘ Adjustment’ gives the value of correction to 7',. The 
‘Error’ gives the value of |7';— 6}. The calculations for (I) were found in an interesting note by Norton 
(1956). Except 7',, our values differ from those of Norton in the sixth place of decimals. 7’, and 7 for 
method (II) were given by Stuart (1958). Our value of 7’; differs from that of Stuart at the fourth place 
of decimals. 


Table 2. Successive iterations by three different methods listed above 


Iterates (I) (II) (ITT) 
7, 0:057046 0:057046 0-057046 
T, 025631 -036986 -036847 
T; -033004 -035791 -036057 
T, -035707 -035718 -035894 
T -035713 -035713 -035832 
Adjustment -000006 — 000005 — 000062 


Error -000001 -000001 -000120 
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It has been observed in §9 that method I is rapidly convergent and is asymptotically better than 
method II. The above example, however, indicates that method II is closer to the true value than 
method I, for the two iterations 7’, and 7’';, and as good thereafter. This needs some explanation. 

An iteration process is of order k if = 0 (r+) /{(e,)*} = a, where & is a constant depending on the root 


of the equation; thus for large r, for the ‘kth- order iteration process 
rsa ~ (@)* (8). (11) 


When the process is convergent e, < 1 and the higher the order k, the more rapidly will e, > 0 as 7 > o. 


For method I we have 
1 (log L 
2\ d&@ a 











k=2, a,(6)= Plog L (12) 
| \ a6? Jog 
and for method IT we have 
1+ @ log L 
de? }e-6 
(13) 


k=1, «,(6) = | nb) 
Method I will be therefore generally better than method II, unless «,(9) is slightly smaller than a,(§). 
If a,(8) is substantially smaller than «,(9), Method II may give better results for a few iterations. In 
the example considered above, «,(9) = 25-55 while «,(9) = 0-06. 

Note that as a,/9) converges in probability to zero as n -> oo, for large samples «,(9) will be usually 
small. Moreover, as 6 is not known, the choice between the methods may be based on «,(7',) and «,(7'). 
If «,(7',) is substantially larger than «,(7',), there may not be much advantage in choosing method I 
as its application involves rather laborious calculations. In the example given above «,(7';) = 16-36 
and a,(7',) = 0-373, and method II may be preferred. 


I sincerely thank Dr V. 8. Huzurbazar for his guidance and encouragement in preparing this paper 
an Dr B. R. Rao for going through an earlier draft. I also thank the referee for drawing my attention 
to Stuart’s paper. I thank the Government of India for awarding me a research training scholarship. 
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Detection of best and outlying normal populations with known variances 


By A. ZINGER 
University of Montreal, Canada 


1. SuMMARY 


The following two problems are considered in the case of several normal populations with known ~ 
variances: 

(i) The selection of the ‘ best’ population (the population with the largest mean), given a probability 
of taking a wrong decision. 

(ii) A test for outliers in the case of up to seven normal populations. 

The statistic used in both cases is the standardized difference between the two largest sample means. 
The‘ best’ population can also be defined as the one with the smallest mean. 


2. INTRODUCTION 


The present paper is an extension of a similar problem considered by Zinger & St-Pierre (1958) in the 
case of three normal populations. 

The non-null distribution of the standardized difference between the two largest sample values is 
derived in the general case of several normal populations with known variances. The least favourable 
configurations are given and two tests are presented. One is for detecting the‘ best’ population and does 
not depend on the number of populations. The other one is for detecting outliers and some critical 
values are given for up to seven normal populations. All numerical calculations were done on an LGP-30 
computer. 


3. NON-NULL DISTRIBUTION OF THE STANDARDIZED DI¥FERENCE BETWEEN 
THE TWO LARGEST SAMPLE MEANS 


Consider n+ 1 normal populations with unknown means yf; and known variances o? (i = 0, 1,...,7). 
From the ith population, a sample of n; values 2,;(i = 0,1,...,;7 = 1,...,n,), is drawn. Let %; be the 
sample mean associated with the population having mean p;. Let %@ > %q) > ... > Xn) be the ordered 
sample means; the event %, = %;, 1 +7, will of course be neglected. Consider fi) > fq) > --- > fin) the 
ordered unknown means. It is not known which population is associated with 4). Only the following 
(n+1)! mutually exclusive events are possible: (%@, Zq),---,Z,)) comes from the populations with 
means (/4;,),Mi,)» +++» Mi)» With t +i, +... #%,= 0, 1,...,.n. It follows that the joint density of 
Xo)» Za)» eoey Xn) is given by 





os - n nt , 1 2 Nay (Gy May”) 

S (oy Za», «+++ Zim) = II 7 x exp _- x ee , 
i=0(27)*o; k=0 On 

where ~’ stands for summation over all permutations of 7p, 7,, ...,7,- The proposed tests assume that the 
values of 09,0 ,...,0, are known and that the sample sizes are chosen so that o?/n; = 0? (say 1) 
(i = 0, 1,...,7). 

To obtain the distribution of u = (%—Zw)s 
let Yo = (Xoy— Miy) + ++» + (Fn — He,» 


and Yx = (Zo — May) — (Gao— May) (k= 1...) 
Integrating out yo, one gets 


S (Yas +++ Yn) = n Lvi-? >» nv}. 


1 1 
| ex —_—— ——— 
(n+ 1)? (2z7)t" P| 2n+1 ( i= 1<i<j 
The domain of integration becomes 


Ye > YratMiypy—My_»y (k= 2,-+-50)5 


; i) ‘ I . Kl )dyy...d 
Seneereraneceeeminmmeeoree eee Yrs +++9 Yn) Wn +++ Fe. 
(n ig 1)3 (27)** Yi-+ Mig) — Mix) Yn—1+ Min) — Min—1) " 


and S(Y¥) = 


Let u, = y, and 


Rey | ARERR. een 
e = Ue Tel) Vky(e+1) 


(Yi +--+ +Yx-1) (& = 2,...)%)- 
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The domain of integration becomes for k = 3, ...,n 


V(k— 1) Ug + VR (Miy — May_y) 





A(k+ 1) 
oan ty > Uy + (Mey — May) ’ 
V6 


Putting u = u, — (Mu, — Huy), one gets 


u) = ——— 2’ exp }H{ —(ut+py)—Ma)*}- 
f( ) V2 (2m)in pif ( Ha, Hay)*} 
co 
«| exp (— 4). 
(U— Palo) — (Hix) + 2Hc42))/ V6 
‘co 
«| exp (— 4u?)... 
(V2 ust V3( Mis) — Mig)))/V 4 
ioe) 
af exp (— 4u?)du,... dug. 
(V(n—1) Un—1t+ V (Min) — Min—))/-V (n+ 1) 


The function f(u), consisting of (n+ 1)! terms, can be written as the sum of a function g(u), contain- 
ing the n! terms with i, = 0, which correspond to a good decision, and of a function w(u), containing the 
remaining n.n! terms with i, + 0, which correspond to a wrong decision. 


4. DETECTION OF THE BEST POPULATION 
In order to detect the best population the following procedure is proposed: 
(i) Draw n,; observations from the ith population, subject to the restrictions 
no? =1/o% (i=0,1,...,n). 

(ii) Let Xo) > 2%) > occ > Xn) 
be the ordered sample means. Compute u = (%)— %)/o. 

(iii) If u > k, decide that %) comes from the population with means (4); 

if u < k do not take the above decision. 

This procedure will be referred to as the I-procedure, since it was introduced by Irwin (1925). 

The critical value k is to be chosen in such a way that the probability of a wrong decision is at most 
equal to a number «(0 < a < 1) given in advance. « will be called the level. Define 

Y = (Ha Aw) /o 

and 8; = (Mo—Marn)/% (t= 1,....n—1). 


If Wee Vo O45 «+9 Ou) =| w(u) du, 
k 


then the condition becomes Pr(u > k and Z%@) comes from fy) Or... OF Jy») = W(k, y, 0b), «--bn) < a. 

There remains the choice of the least favourable configuration, i.e. of the least favourable value of the 
vector (y,4,...,8,). This least favourable configuration is not unique. By numerical calculations it is 
possible to show that for a given n + | there exists a value ky and a corresponding level @, such that: 

If k < ko, the least favourable configuration is the null configuration (0,0,...,0); if k > ko, the least 
favourable configuration is the pseudo-null configuration (0, 00, ...,00). 

The values of k, and a, for n+1 = 3,...,7 are: 


Table 1 
n+1 ko ay 
3 0-8568 0-2723 
4 *7537 +2970 
5 +6906 +3127 
6 *6468 +3237 
7 -6142 
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Since, in practice, the levels will always be smaller than a, the value kis solution of W(k,0,00...,00) = a, 
ie. . r 

—_— exp (— 4/7) dt = a. 

N20 Jaye 
This is easily available. 

The following example shows how to apply the J-procedure. 

Consider five normal populations with variances 20, 40, 5, 20 and 10, respectively. Assume a level of 
0:05. The critical value is then k = 2-326. Suppose that the sample sizes are, respectively, 80, 160, 20, 
80 and 40, so that o = 0-5. Let 22-1, 21-4, 23-6, 22-2 and 21-7 be the sample means. Since u = 2-8, the 
third population is declared to be the ‘ best’ at a 5 % level. 


5. DETECTION OF OUTLIERS 


In the case of outliers, the least favourable configuration is the null configuration. The critical value 
k is the solution of the equation W(k,0, 0, ...,0) = at 


The critical values are: 





Table 2 

no. of Level 
populations r A ~ 
n+1 0-10 0-05 0-01 0-001 
3 1-556 1-957 2-738 3-640 
4 1-422 1-780 2-489 3-320 
5 1-337 1-671 2-339 3-129 
6 1-276 1-594 2-235 3-000 
7 1-230 1-537 2-158 2-903 


The proposed procedure for the detection of outliers is the same as the one defined in §4, but for the 
new choice of the critical value k. The following example taken from McKay (1935) and Nair (1948) 
shows how to apply this test for outliers. 

In the course of routine testing of a standard leather product of a tannery, five parallel tests yielded 
the following values for the hide substance content of the leather specimens: 32-44, 36-45, 39-64, 40-13, 
41-09. The first observation appears unduly low. Long experience of the product in question has estab- 
lished a value of 2-226 for the standard deviation. The value of wu is found to be 1-801, which is significant 
at the 5 % level. 


6. EVALUATION OF PERFORMANCE 


In order to evaluate the performance of the I-procedure, a comparison was made with the M-procedure 
already described in the preceeding paper (1958). 
The critical values for the M-procedure are: 





Table 3. 
no. of Level 
populations r A " 
n+1 0-10 0:05 6-01 0-001 
4 1-696 1-941 2-431 3-014 
5 1-835 2-080 2-574 3-166 


Random numbers from a normal population were taken from Dixon & Massey (1951) and 30,000 tests 
were carried out in the case of 4 populations, and 20,000 tests with 5 populations. These tests were not 
independent because four (or five) random numbers were taken and tested after adjustment for the 
8 different values of y. This was then repeated for the remaining 3 (or 4) cyclical permutations. This 
being done, the first number was discarded and a new number was introduced in the last position and the 
procedure repeated. Even if these tests were not independent, this does not invalidate the comparison 
between the two procedures. The following results were obtained. 
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Table 4. Four populations 
(a) Probability of decision with the I-procedure 






















































10% 5% 1% 01% 

Level - A . cr A, c A a A = 
Y G W G WwW G W G WwW 
0-0 0-030 0-091 0-015 0-044 0-003 0-010 0-000 0-001 
0-5 -076 -063 -041 -031 -009 -007 -002 -001 
1-0 +154 -041 -092 -020 -026 -004 -005 -000 
1-5 +268 023 *183 ‘O11 -069 -002 013 -000 
2-0 +423 ‘O11 “311 -005 *141 001 -037 -060 
2-5 *585 -004 -469 -002 +252 -000 -086 -000 
3-0 -738 ‘001 *632 -000 -398 -000 °175 -000 
3°5 *851 -000 *775 -000 564 -000 +299 -000 

(6) Probability of decision with the M-procedure 
0-0 0-023 0-069 0-011 0-034 0-003 0-008 0-000 0-001 
0-5 -058 -048 -031 023 -008 -005 -001 001 
1-0 -129 033 -078 -016 -024 -003 -004 -000 
1-5 -249 -022 *165 ‘O11 -060 -002 013 -000 
2-0 -406 0-015 0-300 0-008 0-133 0-001 0-036 0-000 
2-5 -585 -010 -468 -005 +254 ‘001 -087 -000 
3-0 *747 -006 *645 003 -414 -000 “181 -000 
3-5 *865 -003 *795 -001 -593 -000 *322 -000 

G stands for good decision, W stands for wrong decision. 
Table 5. Five populations. 
(a) Probability of decision with the I-procedure 
10% 5% 1% 01% 

Level - ~ ‘\ c x ~ c si ‘ Fo —~ 
Y G W G WwW G Ww G W 
0-0 0-022 0-089 0-011 0-046 0-003 0-011 0-000 0-001 
0-5 -056 -067 032 034 -009 -009 -002 001 
1-0 “131 -051 ‘081 027 024 -005 005 -000 
1-5 -239 032 163 016 “059 -003 -014 -000 
20 0-389 0-017 0-287 0-008 0-131 0-001 0-038 0-000 
2-5 554 -006 *447 -003 +243 -000 -085 -000 
3-0 -708 “002 -610 -001 *388 -000 173 -000 
3°5 +828 -000 -750 -000 -553 -000 +299 -000 

(b) Probability of decision with the M-procedure 
0-0 0-023 0-090 0-011 0-046 0-003 0-011 0-000 0-002 
0:5 “056 -068 032 034 -008 -009 -002 001 
1-0 118 -046 -075 -025 -024 -006 -005 -000 
1-5 +234 033 *157 -018 -061 004 -015 -000 
2:0 0-394 0-025 0-290 0-014 0-130 0-002 0-039 0-000 
2-5 *574 -017 -463 -009 +256 -002 -095 -000 
3-0 +736 O11 -641 -006 420 -001 +189 -000 
3°5 “861 -006 *794 -003 *603 -000 *337 -000 


An examination of these results shows that no striking difference exists between the performances of 
these two procedures. There is a slight indication that, for a given level, the probabilities of a good and 
a wrong decision are a little higher for the M-procedure, 
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Longest run of consecutive observations having a specified attribute 


By E. J. BURR* anp GWENDA CANE 
University of New England, Armidale, N.S.W., Australia 


1. INTRODUCTION 


Consider an ordered finite sequence of observations which are ciassified according to possession or 
non-possession of a specified attribute. The ordering may be temporal, or spatial, or according to any 
other property measurable on an ordinal scale without ties. In various applications, the attribute 
may be described by terms such as success, male, defective, positive, greater than the mean, greater 
than the gth percentile, and so on. In a distribution-free comparison of two samples, after pooling the 
observations into a single sequence the attribute would be‘ belonging to the first sample’. We shall use 
the terms trial, success and failure to denote an observation, possession and non-possession of the 
attribute. 

In asequence of n trials resulting in r successes and s = n —r failures, let ky be the length of the longest 
run of consecutive successes observed. On the null hypothesis that the successes and failures occur in 
random order, ky is a statistic with a known probability distribution. There are many alternative hypo- 
theses under which we might expect to obtain significantly large or significantly small values of kp; 
for example, local (or temporary) increases in the probability of success, or various types of serial 
correlation. 

Baticle (1935, 1946) and Garwood (1940) have used an asymptotic formula for the distribution of ko, 
valid when s/n > 0. Barton & David (1959) propose another asymptotic formula which is always 
useful in the upper tail, but they remark that it is inaccurate, for moderate values of n, in the mid-range 
and lower tail. 

In this paper, several asymptotic formulae are compared and their domains of usefulness are dis- 
cussed, with particular emphasis on the lower tail. Alternative formulae are given for the probability 
that ky = 1, and a table is given for determining when the values k, = 1, 2 or 3 are significantly small in 
all cases in which no suitable asymptotic formula is available. 


2. EXACT FORMULAE 
Let n ordered trials result in r successes and s failures (0 < r < n, s = n—1r). When these are permuted 
in all possible ways, the null hypothesis asserts that the ) permutations are equally probable. 
r 


For each integer k in 1 < k <r+1, let P(k) be the probability (when one of these permutations is 
selected at random) that the longest success run will have length ko» less than k; that is, 


P(k) = Pr {ky < kn, 7}. (1) 
Let t = s+1. The s failures divide the sequence of successes into precisely ¢ runs (possibly including 


one or more ‘runs of length zero’). Therefore the number of permutations with longest success 


* This paper was revised with the partial support of a National Science Foundation Grant at 
Stanford University. 
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run < k is equal to the number of ways of selecting in order ¢ integers whose sum is 7, each contained in 
the set 0,1, 2,...,4—1; and this equals the coefficient of x" in the expansion of 


(L4+a+az?+...+a*-lpt, (2) 
Hence by multiplying the binomial expansions in (1—2*)'(1—2)-* and dividing the coefficient of 2" 


by (") weobiain py" (")- (‘)("=")+(3) gy le (3) 


For each integer b > 0 let a® = a(a—1)...(a—b+1) ifa > b, a” = Oifa < b. Then (3) may be written 
in the forms 








t\ (n—k)® (t\ (n—2k) 
Pie) = 1-(7) ne +() — “ 
t\r® (t\ re) = /t\ ren 
P(k) = 1-(;) “at(3) = (s) nant (5) 
_, 2a t—1 (r—k) t—2 (r—2k)” 
Pw = 1-5 1-F ae ~~] (©) 


The form (6) is convenient for use with a desk calculator when k is not large. In (5) and (6) it is to be 
understood that a fraction is zero if its numerator is zero. 
Let N = tk-r—1 and R = N—s. From the fact that the coefficient of x" in (2) equals that of x*-*-" 


we deduce eas (")1¢°) ai (‘) 1 +(%) Py oe (3’) 








N® t (N sias ky t (N -_ 2k) ) 
P(k) = ~~ (1) “mT (3) agen? 378 (4) 
N® t'! R® t\ Re» : 


We could also have deduced (4’) from the fact that the sum of the series (4) becomes identically zero if 
it is extended to ¢+ 1 terms by removing the restriction that a = 0 for a < 0. The series (3’), (4’),’ (5’) 
converge more rapidly than (3), (4), (5) when N < n, that is, when k < 1+ 2r/t. To test whether an ob- 
served value ky is significantly small we require the value of P(k,+ 1), for which one of (3’), (4’) or (5’) 
is to be preferred when 1r/t < ky < 2r/t. 

In the particular case k = 2, P(2) is the probability that k, = 1, that is, that all successes are isolated. 
It is used to determine whether an observed value ky = 1 is significantly small, or whether a value k, = 2 
is significantly large. In this case (2) becomes (1+.2)', so that 


ti(n—r)! —gr-D 


hake ni(t—r)! 0 ned" (7) 


Other properties and applications of the coefficients in (2) are given by Freund & Pozner (1956). 


3. APPROXIMATIONS 


If tables of binomial coefficients are not available and k and s (or 7 in (7)) are large, the terms in 
formulae (4) to (7) may be computed from 


log (a/b) ~ a’ log a’ + (b’ —c) log (b’ — c) — b’ log b’ — (a’ —c) log (a’ —c) 
+ (c/24) [(a’* —a’c)-1 — (b’8 — b’c)-], 


where a’ = a+4 and b’ = b+ 4, and the relative error in a/b is about 1% at most (when a = c < b), 
but is usually much smaller. In particular, (7) yields 
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with relative error about 8% when r = ¢ but less than 3% when r < t. If r/t is small we may use logar- 
ithmic power series to show from (8) that 


(2) 2 
P(2) ~ exp| -F {1 +a} | (9) 


with relative error of order 7*/(15t5). 
Barton & David (1959) derive (5) and show that P(k) lies between the sum of the first 7 terms and the 
sum of the first 7 + 1 terms of the series (7 = 1, 2,3, ...). They conclude that the formula 


P(k) = e-*, (10) 


where v = %)/n®, yields P(k) with error less than }(tv)?, so that (10) is sufficiently accurate when tv 
is small, that is, in the upper tail of the distribution. 

In the lower tail (e.g. P(k) < 0-05), we usually have tv > 2, the series (3), (4), (5) converge slowly, 
and the labour involved in a direct summation may be prohibitive. If it happens that N < n we can say 
at once that P(k) < N®/n®, and if also tR™/N™ is small we have rapid convergence in (3’), (4’) and (5’). 
Otherwise, we seek to modify the terms of (4) or (5). 

Baticle (1935, 1946) and Garwood (1940) modify the terms of (4), for fixed s and large n and k, by 


neiiies (n—Jky%/nl ~w (1—ja)? (7 = 1,2)..-5[r/K)), (11) 
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where « = k/n. The relative error in (11) is about jas®/(2n), but this may be reduced by changing a 
as follows. 

Let n, = n—}(s—1) and # = k/n,. Then by using logarithmic power series we obtain 

(n—jk)®/n ~ (1—Jjf)* exp [— (8° — 8) j8(2 —JP)/{24nt(1 —J8)*3] (12) 

with relative error ~ {(1—jf)-*— 1}s°/(320n4). Hence if « is replaced by £ in (11) the relative error there 
is reduced to about j/s*/(12n?). The terms of (4’) may be treated similarly. 

Now let k& be small and r, n large, with s/n no longer necessarily small. Let wu = (r/n)*. Then 

5K) 4G ~ yi, hat (5) viel 
75) fy uj, so that (5) yields P(k) = (1—uy (13) 
To make this more precise, it may be shown by using logarithmic power series that 
38)/nV® = wi exp { — 6(j2® — jk) — O(6,55K)}, 


where €, = }(r-1—n-) and €, = }(r-?—n-*). Hence if €, 7k? is small, we have 


143®)/nG® ~ wi{1 — (72k? —jk)} (14) 
with relative error $e? j4k*. Using (14), the series (5) may be summed, giving 
P(k) ~ (1—w)'{1 —e, k%tu(tu — 1) (1—u)-2—, ktu(1 —u)“}. (15) 


There is no simple formula for the error in (15), but for k < 4 it has been found sufficiently accurate 
(relative error < 10%), down to P = 0-01 in the lower tail, for all r > 20k. 

In the lower tail, (13) shows that tu > 1, and hence (15) shows that (1 —w)' overestimates P(k). Since 
e-** > (1—u), it follows that (13) is a better approximation than (10) in the lower tail. Near P = 0-01 
we have, from (13), tu ~ log 100, and (15) shows that the relative error in (13) is about 16¢,k?. Hence 
(13) may be used down to the 1% level in the lower tail if r > 80k?(1—r/n). 


4. TABLE OF CRITICAL POINTS 


It seems likely that in practical applications, significantly small values of ky will usually be 1, 2 or 3. 
For these ky, and for certain values of 7, the accompanying table gives the critical values of n below which 
the observed value k, is significantly small at the 5% level or the 1 % level (one-tailed). In order to 
conserve space, only even values of r have been included. When interpolating to odd values of r, with 
ky = 1, asecond-difference correction of — 0-3 should be applied when P = 0-05, and — 0-2 when P = 0-01. 
For k, = 2 or 3, the second-difference correction is negligible. For higher values of r, formulae (9) and 
(15) are sufficiently accurate at the 5 % and 1 % levels. 

Of course, only integral values of n are meaningful, but the critical values are given ‘ correct to one 
decimal place’ in order to make accurate interpolation possible, and also to avoid the question whether 
the tabulated values fall just inside or just outside the boundary of the critical region. 

To illustrate the use of the table, consider a sequence of n trials (n > 60) containing 60 successes, and 
suppose that k, = 2. Then the table shows that this value of k, is significantly small at the 5% level 
(one-tailed) if n < 252, and is significantly small at the 1 % level if n < 205. 
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To demonstrate the adequacy of (15) at the limit of the table, consider the case n = 205, r = 60, 
ky = 2. Here the true value of P(3) from (5) is 000976, while (15) yields the approximate value 0-00957, 
with a relative error of only 2%. But (13) yields the value 0-0245, which is grossly inaccurate. 
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Critical points for ky = 1, 2,3 


For each r, ky and P, the value of ky is significantly small at the level P (one-tailed) ifn is less than the 
tabulated value. For example, if r = 38, the probability that ky < 3 is less than 0-05 if n < 79, and less 
than 0-01 ifn < 69. 








kj =1 i= 2 ky = 3 
c ie Ie. c o oe e: “~“ arse. 
r P = 0-05 P=0-01 P = 0-05 P=0-01 P = 0:05 P=0-01 
4 75 _ oo — —_ — 
6 15-4 12:3 8-1 — —_— _ 
8 26-1 19-9 12-6 11-2 — — 
10 39-6 29-3 17-5 15-2 13-2 os 
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The moments of the non-central t-distribution* 


By D. HOGBEN, R. 8. PINKHAM anp M. B. WILK 
The Statistics Center, Rutgers University, New Jersey 


1. INTRODUCTION AND SUMMARY 


The following note presents a procedure for finding simple explicit expressions for the raw moments 
of the non-central ¢. A table of numerical values relevant to the first four central moments and cumulants 
is given. Application of the table is illustrated in finding the approximate values of the expectation of 
certain functions of the non-central ¢. 

If X isnormally distributed with mean é and variance 1, and Y* is independently distributed as central 
x? with f degrees of freedom, then the random variable t = 4(6,f) = X ,/(f)/Y is said to have a non-central 
t-distribution. The density function for this random variable is 


_ ee. £7 es © ot ao 
10 = FeaT(af) ap) Gea exp[ 3 (fra) |! 7a) 


0 af F 
where Hh,{z) =| — e-Kr+2" dy, 
of! 





When 6 = 0, this reduces to the usual central Student’s ¢-distribution. 

Fisher (1931) gave a derivation of the distribution, discussed computational procedures and gave 
large sample approximations to the variance, y, and y, of the non-central ¢. Johnson & Welch (1940) 
presented tables of the non-central ¢ and listed the first three central moments. More extensive tables of 
the probability integral, the density function and the percentage points of the non-central ¢ for a 
restricted set of values of d are given by Resnikoff & Lieberman (1957). Azorin (1952) used the moments 
of a truncated normal distribution to obtain the frequency function and the moments. 

In a recent paper Merrington & Pearson (1958) calculated the first four moments for a considerable 
number of non-central ¢-distributions, in order to examine the adequacy of representing the distribution 
by a Pearson Type IV curve. The formulae which they used were in similar form to those given on p. 3, 
but no table of the values of the c’s as functions of f was available to them. For the mean ¢, involving 
our constant ¢,,, they made direct use, however, of a table of the expectation of y/,/f (Pearson & Hartley, 
1954, Table 35) which up to f = 100 has very similar argument entries to our table. 


2. DERIVATION OF RAW MOMENTS 


Attempts to determine the moments of the non-central ¢ directly using its density may lead to com- 
plexities. The moments can, however, be found explicitly by the following indirect procedure. 
The joint density for (X, Y*) is 
pla, y?) = [T(4f) 28049 Jar} y’-2 e-he-8P dv? 
Since p is a density 


:: I yf eK +0482 dy2da = (T(4f) 2tv+) Jn] ets", 
—-oJ0 


* This work was supported by the Office of Naval Research. 
+ Prof. Pearson writes that our table would undoubtedly have shortened Mrs Merrington’s and his 
work in computing the third and fourth moments of non-central ¢. 
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Continued differentiation of both sides k times with respect to 6 gives 
‘: I kyl? e-Ha* +0482 dy2dx = [T(4f) 24+ /7] Dt ete", 
—-oJ/0 


Sincet = X ,/(f)/Y, it follows from replacing f by f—k in the above identity and dividing through by the 
appropriate normalizing constant, that the kth raw moment of the non-central ¢ is 


E(t*) = ane te e-*8* Dk etd", 


The function e-!2* D* eé" is similar to the Hermite polynomial e#* D* e-*®", except all coefficients of 
6 are positive; it satisfies similar recurrence relationships (Fisher, 1931). 


3. CENTRAL MOMENTS AND CUMULANTS 
The central moments and cumulants are polynomial functions of 6 whose coefficients are functions 
of f. Thus 
f= 04,8, fg = C_Q5* + C9, fly = C3 5° +05, 9, 


By = Oy + CP + Cy9, hy = fy—3yy, 


eens ey = VATE DITGS)s 
=fiif-2)—ch, Coo =f f—2)s 
‘f(7-2f) i 
Cn = Cul 7-3) — 5 teh | a = 7-27-38) 


Cy = f?/(f — 2) (f — 4) — 2f(5—f) c3,/(f— 2) (f—3) — 3eh,; 
Cs. = 6{ f/(f — 2} (ff —-4) —(f- VD eh, (Ff —-3))5 
Cy = 3f?/(f — 2) (f—4). 


The coefficients c;; are tabulated in Table 1 for 45 values of f from 2 to 1000. The corresponding central 
moments and cumulants may then be written and computed as polynomials in 6. Table 1 is accurate to 
as many digits as shown. For intermediate values of f it is recommended that Cy) and C4) be computed 
directly, whereas linear interpolation on 1/(f—a@) be used for ¢,, Coo, Cggs C31» C44 @NA C42, Where a equals 
1, 3, 9-8, 4-8, 11-4 and 7-5, respectively. 

Additional tables giving central and raw moments and certain other relevant functions with com- 
putation and interpolation procedures for selected f between 2 and 1000 and 6 from 0 to 10 are given by 
Hogben, Pinkham & Wilk (1960). 


4. Two EXAMPLES 


Moments of functions of random variables may often be well-approximated in terms of Taylor’s 
series involving the central moments of the original variable. 
Thus if Z = g(t), then 
Eo U9(t) 


iit det 


where /ly = # = E(t), and uw, = H(t—p)',i > 0. 
Resnikoff & Lieberman (1957) consider the average value of 


(t)=p =[° . e-** dar 
" vise) N20 ‘ 
where ¢ has a non-central ¢-distribution with f = 16 and 6 = 8-08114. 

Proceeding by direct numerical integration, using their tables, they find H(p) = 0-0294. Proceeding 
as suggested above, one finds an approximation to H() using the first four central moments of the non- 
central ¢-distribution. This value is 0-0295, which is about 1 in 300 greater than Resnikoff & Lieber- 
man’s estimate. 

As another example, suppose 6 is distributed as a non-central beta variable with one degree of freedom 
in the numerator, then 

b=g(t) =P/(f+e), 


E(Z) = 





t=y 





where ¢ is the non-central ¢. 

















y the 


ts of 


ions 


ntral 
te to 
uted 
uals 


com- 
mn by 


“lor’s 


>dom 











Miscellanea 


It has been shown by Hogben, Pinkham & Wilk (1961) that 


f 


orn 


8 
E(b) =1 - Fae] x! et dx. 
0 





For the values f = 9 and 6 = 1, the correct value of H(b) is 0-1700, while the series approximation with 
four terms yields an estimate of 0-1730. 


Table 1. Non-centrality coefficients in the moments of the non-central t 


f Cy 
2 1-77245 
3 1-38198 
4 1-25331 
5 1-18942 
6 1-15124 
7 1-12587 
8 1-10778 
9 1:09424 
10 1-08372 
11 1-07532 
12 1-06844 
13 1-06272 
14 1-05788 
15 1-05373 
16 1-05014 
17 1-04700 
18 1-04423 
19 1-04176 
20 1-03956 
21 1-03758 
22 1:03579 
23 1:03416 
24 1:03267 
25 1-03130 
30 1:02590 
35 1-02209 
40 1-01925 
45 1-01706 
50 1-01532 
60 1-01272 
70 1-01088 
75 1-01014 
80 1-00950 
90 1-00843 
100 1-00758 
150 1-00503 
200 1:00377 
300 1-00251 
400 1-00188 


500 1-00150 
600 1-00125 


700 1-00107 
800 1-00094 
900 1-00083 


1000 1-00075 


Coz 
1-090141 
0-429204 

-251956 
-174641 
-132419 
-106149 
-0883494 
-0755460 


-0659193 
-0584307 
-0524464 
*0475588 
-0434943 
-0400628 
-0371282 
-0345905 
-0323749 
-0304241 
-0286935 
-0271480 
-0257595 
-0245054 
-0233672 


-0189588 
-0159462 
-0137581 
-0120972 
‘0107936 
-00887919 
-00754117 
-00701270 
-00655342 
00579435 
-00519282 


-00341823 
00254753 
-00168769 
-00126180 
-00100754 
-0°838566 
-0°718127 
-0°627940 
-09557878 
-0°501880 


C29 
3-00000 
2-00000 
1-66667 
1-50000 
1-40000 
1-33333 
1-28571 
1-25000 


122222 
1-20000 
1-18182 
1-16667 
1-15385 
1-14286 
113333 
1-12500 
1-11765 
111111 
110526 
1-10000 
1-09524 
1-09091 
1-08696 


1-07143 
1:06061 
1-:05263 
1-04651 
1-:04167 
1-:03448 
1-02941 
1-02740 
1-02564 
1-02273 
1-02041 


1-01351 
1-01010 
1-00671 
1-00503 
1-00402 
1-00334 
1-00287 
1-00251 
1-00223 
1-00200 


C33 


1-430774 

0-391819 
-173514 
-0958806 
-0602294 
-0411291 
-0297802 


-0225163 

-0175994 

-0141225 

-0115764 

-00965776 
-00817696 
-00701076 
-00607629 
-00531620 
-00468978 
‘00416752 
-00372761 
-00335365 
-00303312 
-00275632 


-00181055 
-00127910 
-0°951236 
-0°734909 
-0°584744 
09395412 
-0°285075 
-0°246471 
-0°215207 
-0°168194 
09135056 


-04584851 
04324755 
-01142490 
-0°796378 
-0°507727 
-0°351686 
-05257910 
-0°197192 
05155640 
-0°125961 


Cy) 


7-51988 
2-97354 
1-72686 
1-18216 
0-886227 
-703441 
-580566 


-492853 
*427377 
+376782 
*336598 
-303961 
-276960 
*254271 
-234951 
-218311 
-203835 
-191133 
-179900 
-169897 

160935 
-152861 


°122131 

-101628 

-0869916 
-0760262 
-0675079 
-0551391 
-0465944 
-0432423 
-0403396 
0355638 
‘0317982 


-0207881 

-0154402 

-0101943 

-00760893 
-00606957 
-00504825 
-00432113 
-00377709 
00335473 
-00301732 


Note: 0°761727 should be read 0-000000761727, etc. 


Cas 


2-32912 

0-555627 
+220997 
-112216 
-0658634 
*0425458 


-0294140 
-0213878 
0161682 
-0126046 
-0100748 
-00822029 
-00682400 
-00574860 
-00490411 
-00422971 
-00368316 
-00323446 
-00286183 
-00254919 
-00228444 


-00141976 
-0°964543 
-0°696724 
-0°526289 
-0°411303 
08270773 
09191529 
09164338 
-09142542 
-0°110177 
-04876917 


04369838 
-04202676 
-0°877648 
-0°487310 
-0°309461 
05213793 
05156492 
-0°119482 
-0°942025 
-0°761727 


C49 


21-7058 
7-11961 
3°62848 
2-25553 
1-56996 
1-17491 


0-924366 
-754190 
-632497 
-541926 
-472333 
-417451 
+373228 
+336939 
-306697 
+281155 
+259332 
-240493 
+224085 
-209678 
-196937 


-150530 

+121426 

-101573 

‘0872081 
-0763532 
-0610691 
-0508439 
-0469076 
0435333 
-0380512 
-0337901 


-0216437 

0159134 

-0104011 

-00772428 
-00614302 
‘00509909 
-00435839 
-00380557 
-00337720 
-00303550 


25-0000 

13-5000 
9-80000 
8-00000 
6-94286 
6-25000 


5-76190 
5-40000 
5-12121 
4-90000 
4-72028 
4:57413 
4-44615 
4-33929 
4-24706 
4-16667 
4-09598 
4-03333 
3°97744 
3°92727 
3°88199 


3°70879 
3°59238 
3°50877 
3°44583 
3-39674 
3°32512 
3°27540 
3°25584 
3°23887 
3°21089 
3-18878 


312384 
3-09215 
306095 
304553 
303634 
303024 
302589 
3°02263 
302010 
301808 
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Estimation of missing observations in split-plot experiments 
where whole-plots are missing or mixed-up 


By J. D. BIGGERS 
The Wistar Institute, Philadelphia 4, Pennsylvania 


SUMMARY 


The problem of estimating missing split-plots when entire whole-plots are missing is discussed. The 
simultaneous equations for estimating the missing values are singular and inconsistent, but they can be 
solved subject to the condition that the sum of the unknowns equals the value which minimizes the whole- 
plot error mean square. If some of the values are merely mixed-up a direct solution is always available. 


1. InTRODUCTION 


This paper is concerned with estimating missing observations in split-plot experimental designs where 
some of the missing observations comprise whole-plots. In some types of experiment a common occur- 
rence is loss of a whole-plot. This may occur, for example, with a litter of mice in an experiment where 
each individual receives a split-plot treatment and each litter a whole-plot treatment. If the animals 
forming a litter are housed in the same cage some unexpected catastrophe, such as flooding from the 
water bottle, may destroy them during the experiment. Loss of an entire whole-plot may occur in organ- 
culture experiments where paired organs from an embryo are used, and the embryo becomes infected 
while it is being removed from the egg or uterus. The subject of estimating missing plots in split-plot 
designs where the whole-plots are arranged in randomized blocks and latin squares has been discussed 
in an earlier paper (Biggers, 1959). There, it was merely stated that when entire whole-plots are missing, 
the matrix of the equations for estimating the missing values is singular, and there is therefore no simple 
solution to the problem. Wilkinson (1958) has discussed briefly those cases where the equations are singu- 
lar and consistent, but has not considered split-plot designs where the estimating equations are both 
singular and inconsistent. 

In the present paper the discussion will apply specifically to split-plot designs where whole-plot 
treatments are arranged in randomized blocks, since the formulae which are derived can easily be applied 
to other types of design. 


2. NoTaTION 
Scalars—letters in italics. 
Vectors—lower case letters in Clarendon type. 
Matrices—capital letters in Clarendon type. 
Transposition of vectors and matrices is indicated by a prime. 
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3. THE PROBLEM 


Consider a design with r blocks containing ¢ plots to which ¢ whole-plot treatments are allotted at 
random, and let s split-plot treatments be allotted at random within each whole-plot. We denote the 
observation made on the kth split-plot treatment in the ijth plot by X;;,(i = 1, 2,...,7; 7 = 1,2,..., t; 
k= 1,2,...,8). At the end of the experiment let p split-plots be missing, denoted by the corresponding 
letter in the lower case: 2;;,. 

The analysis of variance of this design provides two error terms: the error derived from the analysis 
of whole-plot totals, and the error derived from the analysis of individual observations made on the split- 
plots. We are concerned with estimating values for the missing observations which minimise the second 
of these two error terms. It is well known that in this design missing values in a given whole-plot are 
calculated only from the available observations which have been infiuenced by the allotted whole-plot 
treatment. Thus we need only consider the estimation of missing values within a given whole-plot 
treatment. 

Suppose now that in the randomized block design all s cbservations on the split-plots in the fgth 
whole-plot are lost. Let J,, be the totai of the available observations receiving the gth whole-plot 
treatment and the kth split-plot treatment, and S, be the total of the available observations receiving 
the gth whole-plot treatment. The matrix equation for estimating values for these missing observations 
which minimize the split-plot error sum of squares is given by 


(r—1)(s—1) —(r—1) ° ° —(r—1) Uyg1 | sI,,—S, 
—(r—1) (r—1)(s—1) ; : —(r—1) Xyg2 sI,.—S, 
—(r—1) —(r—1) .  . @=ne-ndbeed Letn—s, 

f(s 1) Lon — 2 Lox | [(s—1) -1l : ° -1 [Pla 
ke 
(8—1)Ig.— > Lox. -l (s—1) . ° -1 ioe 
~ nee | = 
(8-1) Ip D Lox «A -1 (s—1) i 
h kes - A at - 























since S, = & I,,. This equation may be written in abbreviated matrix notation 
k 
Ax = Ai/(r—1). (1) 
The circulant matrix A in equation (1) is singular and therefore cannot be cancelled from each side of 
the equation; furthermore, the equations are inconsistent. Thus there is no set of values for the elements 
of x which minimizes unconditionally the split-plot error sum of squares. 

It is possible, however, to minimize the error sum of squares in the analysis of whole-plot totals using 
the formulae for estimating missing values in randomized block designs. Let the usual estimate of the 
missing whole-plot treatment value of the fgth plot be M,,. (In the sequel the subscript will be omitted 
since no confusion can arise.) Then we can minimize the split-plot error sum of squares subject to the 


condition that 
’ Tryon = Myp. (2) 
k 


4. SOME FUNDAMENTAL MATRICES 


We first give some properties of matrices which arise in the sequel. The d x d square matrix 


(s—l)a —a e 4 —a 
—@ (s—-l)a . . —a 
(3) 
-—a —a . « te—De 


is a circulant matrix, whose determinant equals a4s*-1(s —d). Let C be a matrix of this type of order s. 
30-2 
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Then the determinant of C is zero, and the matrix is singular and of rank (s— 1). The adjugate matrix 
of C is an s x s matrix with all its elements equal to a*—1s*-?, 
Consider now the bordered matrix 


where u is an s x 1 column vector whose elements are either 0 or 1, which can be arranged in any order. 
Let w of the s elements be 1 and (s—w) of them zero. It can be shown that 


det B = u’ adj Cu = —w*a*-1s*-?, (4) 


Thus this matrix is always non-singular, provided w + 0; a + 0; 8 + 0. 
An important special case of (4) is where w = s, i.e. all the elements of the vector u are 1. This matrix is 








[(s—l)a —a ling —a ¥ 
-—a (s—l)ha . . —a 1 
, (5) 
—a —a (s—l)a 1 
: ie ie " iodine ttt Ae Skt : * 
and its inverse is given by 
[(s—1)/(as*) = —If(as*) . . —I1f(as*) | 1/8) 
—If(as*) (s—1)/as*) . . —I/(as*) § 1s 
; ae Pas & (6) 
— 1/(as?) — 1/(as?) .  « (8—1)/(as?) 1/8 
; : = AN = Rie Eee orcaes Niles ; - sacetelass * : ‘ 








5. CONDITIONAL MINIMIZATION OF THE SPLIT-PLOT ERROR SUM OF SQUARES 
Equation (2) may be written in matrix notation 
z=, (7) 


where 1 is a unit column vector of length s. We introduce an undetermined multiplier 2A/N, giving the 
Lagrange equations 


Pa { F(x) + 2A(1’x—_M)/N} = 0, 


where F(x) is the sum of squares for the split-plot error expressed as a function of the vector of missing 
values. Introducing the notation used in equation (1) of the previous paper (Biggers, 1959) and equation 
(1) of the present paper, we find that 


2{x’A — i’A/(r—1)}/N +2A1’/N = 0. 


Transposition and rearrangement gives 
Ax+Al =Ai/(r—1). (8) 


The values of x7,,(k = 1, 2,...,8) are given by the simultaneous solution of (7) and (8). These two matrix 


equations may be combined to give 
A: 0 i 
aie oes i eo (9) 
0: (r-—1))] [uM 


A:l x 
et A = 1/(r—1) 
ce A 








The matrix on the left side of this equation is of the form (5) and will always be non-singular. Thus 
equation (9) has a unique solution given by 


x A H a = A 0 i 
A = utr» =akan i | en hesecasecenees |. (10) 
A 1’ 50 0 : (r—1)] [uw 
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Expansion of (10) with respect to the ~,,;’s gives 


Lox ye -1) 7) a -1 (r—1)] [lo] 

yes -1 (8-1). .  =1 : (r-1)] | Dye 
= 1/s(r—1) : ‘oak Tees oe Fe : (11) 

ies ak OE eee | 

| | 








Thus once an estimate of M has been obtained by an analysis of the available whole-plot totals, the 
values of the missing split-plots can be calculated directly using equation (11). Equation (11) allows us, 
however, to write formulae for a few simple cases which can be used directly. 


(a) s= 2. Xygy = (Ig, — 192+ (7 — 1) M)/2(r—1), 
Xyqq = (Igg—1,, +(r— 1) M)/2(r—1). 


In this case it is usual to avoid the problem just discussed by analysing separately the sums and dif- 
ferences of the split-plots in each whole-plot. Thus in each analysis it is only necessary to estimate values 
for the missing function using the formula for a randomized block design. Often, however, it is preferable 
to keep the data in its original form, and use the formulae just given. 


{b) s=3. Xo, = (215, —I59—I,3+(r—1) M)/3(r—1), 
Lpgq = (21 49— 14, — 1,3 +(r — 1) M)/3(r—1), 
Xyog = (21 q3 — 14, — Lg + (7 — 1) M)/3(r — 1). 


6. THE CASE WHERE SOME OF THE VALUES ARE MIXED-UP 


A more general problem is the case where (s—w) split-plot observations in the fgth whole-plot are 
missing and the remaining w are mixed-up. Let the sum of the mixed-up values equal H;,. In this case 
we can always use the method for estimating mixed-up values given previously (Biggers, 1959) and 
obtain a unique solution for both the mixed-up and the completely missing observations. In this pro- 
cedure we minimize the error mean square, subject to the condition that the total of the mixed-up values 
equals H. The matrix is a bordered matrix of the form given by (5), where u contains both zero and unit 
elements. Since this matrix is non-singular in all practical cases, it always has an inverse, and so unique 
values can be calculated for all the missing observations. 


7. MORE COMPLEX CASES 


Consider an experiment where the fgth whole-plot is entirely missing, and the beth whole-plot has 
some of its s split-plots also missing. The estimation of missing values is done in three stages. 

(i) Estimate the missing values in the beth whole-plot using the formula given in the previous paper 
(Biggers, 1959). The matrix of the estimating equations will be non-singular and so a unique solution is 
always possible. 

(ii) Calculate the total of the beth whole-plot including the estimates of the missing observations in 
(i), and use the total in calculating an estimate of the fgth whole-plot total by the formula for the random- 
ized block design. 

(iii) Proceed with calculating the conditional estimates of the split-plots in the fgth whole-plot given 
by formula (11). Extensions to more complex cases are obvious. 

Consider now an experiment where the fgth whole-plot is entirely missing and missing observations 
occur in other whole-plots subjected to the same whole-plot treatment. The matrix of the estimating 
equations is singular in these cases and in order to solve the equations they will have to be minimized 
conditionally. However, the matrix will not be of the form (5), and will have to be inverted as a special 
problem. 


8. REDUCTION OF THE NUMBER OF DEGREES OF FREEDOM 


The number of degrees of freedom which should be deducted from the number of degrees of freedom 
for error is determined by the number of constraints placed on the data. The number varies with the 
problem. In general, the number equals the number of missing or mixed-up observations minus the 
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number of sets of mixed-up observations. This applies irrespective of whether or not the matrix is 
non-singular or singular. If there are several submatrices in the complete solution corresponding to 
different whole-plot treatments, then the total number of degrees of freedom to deduct is the sum of 
the numbers associated with each submatrix. 
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Deterministic customer impatience in the queueing system GI/M/1; a correction 


By P. D. FINCH 
University of Melbourne 


1. INTRODUCTION 


In a recent paper, with the above title, Finch (1960) certain paragraphs were inadvertently left out 
of the revised version of the manuscript. This has lead to an ellipsis in the argument leading to the 
fundamental equation (3) of that paper. I am indebted to Mr D. G. Kendall for pointing out the existence 
of this ellipsis. The argument of the next section should come between equations (1) and (3) of Finch 
(1960). 


2. THE CORRECTION 


Note first that an impatient departure occurs always at the head of the queue. For, if at any time the 
arrivals at ¢,,, £,,1 are each waiting forservice and if w’,, w’,, , are, respectively, the times they have been 
waiting then w{, = w’,.4+7,4:- Thus the arrival at t,,, cannot depart impatiently and leave the arrival 
at ¢, still waiting for service. 

If w, < W it is easy to see that 


Wass = 0 if w,zt+u, <0 
=w,tu, if O<w,t+u,<W (2) 
=a W if W<w,+u,. 


If w, = W, then on the impatient departure of the nth customer some earlier customer is still receiving 
service, let the remaining service time of this customer be s’. Then it is not difficult to show that, if 
w,= W, 


Was =0 if W+u,, <0 
=W+u if 0< W+ui<W (2) 
=W if W<W+u, 


where u;, = 8’ —T,4,. Note that (2)’ is the same as (2) with w, = W and u, replaced by u’.. 

Because the service time distribution is exponential, the random variable s/, has the same exponential 
distribution as service time and is independent of 7,,,, w,. This u;, has the same distribution as w,. 
Thus if we define w, by equation (2) in the case w, = W also, the distribution of w,, will be that of the 
waiting time of the nth customer. 

Thus we obtain the equation (3) of Finch (1960). 


3. THE LIMITING DISTRIBUTION 


The sequence of random variables {w,} forms a Markov chain with state space the finite interval 
(0, W). In Finch (1960) it was stated that it follows from the general theory of Markov chains that a 
limiting distribution F(x) = lim Pr(w, <x) exists and is independent of the initial distribution 

no 
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Pr(w, < x). Mr D. G. Kendall has pointed out that this limit theorem does not follow from general 
theory which establishes only that ‘ 

limn-! Y F,(x) 

n—>o j=1 
exists. 

A proof of the existence of a limiting distribution for the sequence {F’,(x)}, where F’,(x) = Pr(w, < x) 

and the sequence {w,,} is given by (2) will be found in Finch (1961). 
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Corrigenda 
(1) Biometrika (1957), 44, pp. 399-410 
‘Further contributions to multivariate confidence bounds’ 
By 8. N. Roy and R. GNANADESIKAN 
p. 400, equation (2-3) and the following line should read 
a(ISeT)a aM Dyy Si" Dyy TA, (TSS Ta 


w 2. TESS, Ee cal ago tS. 
2 a’a " a’a =e a’a 


for all non-null a(p x 1)’s, where S = T'7” and T is a lower triangular matrix. 
(2) Biometrika (1958), 45, pp. 411-20 
‘Moment estimators and maximum likelihood’ 


By L. R. SHenton 





p. 412, equation (6) read a for ae 
line 6 read 0 for 86. 
lines 10, 12, 14 read @ for 9@. 
p. 415, equation (18) read A*a* for (Aa)*, 
line 21, read As(s—1)q,_,(x)/a*? for As(s—1)q,_,(zx), 
line 24 insert Pon the right-hand side of equation (22). 


p. 419, equation (35) should read 
1 A UL+a-1A)A(L+a-) 2(1 +a-2A) A(1 + 20-4) 


$ Pe 1 A Mite 
ea0ott+2 2+1+ 2+ 1+ 2+ 1+... 


(3) Biometrika (1959), 46, pp. 483-6 
‘Extrema of quadratic forms with applications to statistics’ 


By K. A. Busu and I. OLKIN 


(2°3) 


The authors are indebted to D. G. Kendall for pointing out an error in (1’) on p. 485. 


(a) The paragraph including this formula should read 


By making the correspondence w = x, A = M, a(B’A-'B)-! B’ = y, we obtain the bound 


in J, [wB(B’A-B)-a’] 
> = ai! Bie '4-1R)-1q’ 
wAw' > a(B’A-1B)1a a(B’AB)"a’, 


with equality holding if and only if w = aa(B’A-1B)-! B’A-1. Because of the constraint 


wB = a, we also have a = 1. (Note that B’A-'B is non-singular by conditions I.) 











2-3) 


raint 
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(b) p. 486 (ii) should read 
(ii) The solution is given directly by (1') with A = J, a =e: 1xm+1, and B replaced 


by B’: ww" > e(BB’)—e’, 


with equality holding if and only if w = e(BB’)- B. This is equivalent to the solution of 
Karush and Wolfsohn, and presents an alternative expression for their result. 


K. A. Buss and I. OLKIN 


(4) Biometrika (1960), 47, pp. 263-71 
‘Supplemented balance’ 
By S. C. PEarcr 
p. 268, six lines from the bottom 
1 MAo]\* (1 1oAo]\° 
rad tro") lo") 


in place of the expression given. 


(5) Biometrika (1960), 47, pp. 393-8 
‘On the power function of the exact test for the 2 x 2 contingency table’ 
By B. M. Bennett and P. Hsu 


p. 395, five lines from the bottom 


real m,=min(r,B) for m, = max(r, B). 


(6) Biometrika (1961), 48, pp. 95-108 
‘Significant tests for paired-comparison experiments’ 
By T. H. Starks and H. A. Davip 


p. 97, equation (13) read 28"-2nlt+1 for 23n—2n+1, 
p. 99, equation (26) should read 
inj.% (n(t — 
f(a Bai sens a,.) = 2-}ns(t-s—l) > Tl e °) ’ 
p \@p)—4p 
p. 102, in last line of footnote to Table 2, 
read o = 0-00005,/(k/3) for o = 0-00005(k/3). 
p. 105, equation (51) read Q}/S,<D, for Q}/S;,<D. 


p. 106, §8-2. In(3) read mj, for m,. 
































Corrigenda 


(7) Biometrika (1961), 48, pp. 115-23 


‘The use of orthogonal polynomials of positive and negative binomial 
frequency functions in curve fitting by Aitken’s method’ 


By H. T. Gonrn 


p. 120, line 4 read with mean as origin for with zero origin. 


(8) Biometrika (1961), 48, pp. 151-65 
‘Expected values of normal order statistics’ 
By H. Leon Harter 


The author wishes to thank Churchill Eisenhart for calling to his attention the inexcusable 
failure to mention, in the historical section, the twenty-decimal-place tables of expected 
values of order statistics and products of order statistics for samples of size twenty and less 
from the normal distribution computed by J. Barkley Rosser. A ten-decimal-place version ) 
of these tables was published by Teichroew [Annals of Mathematical Statistics (1956), 
27, pp. 410-26]. | 


(9) Biometrika (1961), 48, pp. 181-90 
‘Test of independence in intraclass 2 x 2 tables’ , 


By M. Oxamorto and G. IsHir 





p. 181, equation (2) read pzg=py for pyq=Pi- 

p. 184. fourteenth line of §4 read 2np for 2p. | 
sixteenth line of §4 read N(np*,np*q") for N(2np?, np*q?/4). 
equation (14) read 2n-—3 for 2n-—s. 


(10) Biometrika (1961), 48, pp. 197-9 


‘On Durbin’s formula for the limiting generalized variance of a sample 
of consecutive observations from a moving-average process’ 


By A. M. WALKER 


min (i,j) h 
p. 197, three lines from the bottom read > vivi for > bbe. 
r=1 


r=1 
p. 198,line8 read bY=a;; for bd =a,_,. 


line 9 read a, 0; fOr Gp; : 
j i 

line 10 read > 4,01» 4; for hp hnin—i4je 
r=0 r=0 


five lines from the bottom 


read > (3 Arxs) [0 for > ( % Buti) [ot 


t=h+1 \Vi= t=h+1 \Vi= 





sable 
acted 
1 less 
rsion 
956), 











Biometrika (1961), 48, 3 and 4, p. 477 
Printed in Great Britain 


Reviews 


An Introduction to Statistical Communication Theory. By Davin MippLeTon. 
New York, Toronto, London: McGraw-Hill Book Company, Inc. 1960. Pp. 1140. 
£9. 14s. 


In the preface the author defines statistical communication theory as ‘a theory which applies 
probability concepts and statistical methods to the communication process’, and this definition 
indicates the interest of this book to the modern statistician, who will find most of the more recent 
developments in statistics, such as decision theory, information theory and stochastic processes, 
discussed and made use of. The reviewer was somewhat surprised by the author’s insistence on calling 
this work of well over a thousand pages ‘an introduction’, for the treatment of the various topics is 
usually very detailed, with an impressive coverage of the relevant literature. However, the author 
explains that various important topics have been omitted, including linear and non-linear feedback 
and control systems, coding methods, sequential statistical methods, and certain aspects of noise 
theory. 

The book is divided into four main parts; the first part (six chapters) develops the necessary statistical 
theory, including that of stationary time-series, noise, and information theory. The second part (five 
chapters) discusses random noise processes at greater length, dealing in particular with normal pro- 
cesses, and functionals and expansions derived from them. It contains a chapter on the Langevin 
type of equation, i.e. the direct stochastic equation of the process, and the alternative Fokker—Planck 
type of equation, which refers to the changing distribution for the process and is a special case of the 
fundamental ‘forward’ and ‘backward’ equations of Kolmogorov. 

The third part (six chapters) deals more specifically with special communications systems, such as 
amplitude and frequency modulation. The statistician will no doubt risk skipping some of these more 
technical sections, though he should not overlook an exposition of linear prediction and optimum 
filtering included in this part. The fourth part (six chapters) contains a detailed discussion ofsignal 
detection and reception, with a formulation first in general Bayesian terms and then in terms of 
specific alternative procedures.The problem is of course similar in principle to discrimination procedures 
generally, and the likelihood ratio criterion is generalized to incorporate prior probabilities and cost 
functions in order to provide the optimum decision rule. It is noted that two of the particular pro- 
cedures discussed, the Neyman—Pearson system (in which error ‘of the second kind’ is minimized for 
given risk of error ‘of the first kind’) and the so-called Ideal system (in which both types of possible 
error are treated symmetrically) can be regarded as special cases of Bayes procedures. 

The reviewer occasionally felt in reading this book that he was in danger of not seeing the wood 
for the trees; while the author does warn us that a priori signal probabilities may be hard to obtain, 
each reader must be prepared to make his own decisions and judgements on the relative value of the 
various alternative decision procedures discussed. 

However, it would seem rather unfair to criticize the book for containing so much, and while most 
statisticians may think twice before buying a book at this price, they can be assured that there is 
plenty in this volume to engage their attention if they are able to procure a copy. 

M. S. BARTLETT 


Introduction to Linear Programming. By Wattrer W. Garvin. New York, Toronto 
and London: McGraw-Hill Book Company, Inc. 1960. Pp. 281. 68s. 


This book is based on a course of lectures given to scientists, engineers and economists of the Standard 
Oil Company of California, to teach them how to use linear programming, and in particular how to 
recognize and formulate linear programming problems. The book should be equally valuable to 
people interested in other fields of application, who, without necessarily being professional mathe- 
maticians, want to understand the mathematical basis of the subject—which is in fact fairly elementary. 

Part I (80 pp.) discusses the general linear programming problem. It has five chapters entitled 
Introduction, The Simplex Method, The Computational Procedure, Sensitivity Analysis, and A Gasoline 
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Blending Problem. Part II (65 pp.) discusses the transportation problem and its variants. It has five 
chapters entitled The Transportation Problem, Unbalance and Trans-shipment, Assignment Problems 
and the Caterer Problem, A Tanker-routing Problem, and the Generalized Transportation Problem. 
Part III (118 pp.) entitled Special Methods, has eight chapters entitled Upper Bounds, Statistical 
Linear Programming, The Revised Simplex Method, Resolution of Degeneracy, Parametric Linear 
Programming, A Simple Economic Model, Duality, and the Warehouse Problem. Finally, there are 
four pages of reference and a seven-page index. 

The author is concerned exclusively in this book with Dantzig’s Simplex Method and modifications 
of it. This seems entirely reasonable, but I think he could usefully have included a chapter on Lemke’s 
Dual Method. This deserves to be justified in its own right, and not merely as the Simplex Method 
applied to the dual problem, both because of its fundamental role in Integer Programming and other 
types of non-linear programming, and—perhaps more relevantly—because of its value in what the 
author calls ‘Sensitivity Analysis’, i.e. the study of the effects on the solution of changes or additions 
to the assumptions of the problem. Indeed, Orchard-Hays’s parametric linear programming procedure, 
described in Chapter 15, can most easily be thought of as using the dual method in the way that 
Gass and Saaty use the original simplex method to solve for a parametric objective function; although 
historically it was developed independently. 

The chapter on the warehouse problem is included to describe an application of the duality principle. 
I did not enjoy it because, as A. J. Hoffman pointed out in a private communication, the problem can 
be solved much more simply by transforming it into a special case of the purchase-storage problem 
discussed by Beale, Morton and Land, ‘Solution of a Purchase-Storage Programme’ Operational 
Research Quarterly, Vol. 9, pp. 174-97 (1958). (Each time period is subdivided into a selling period 
in which requirements not filled from the warehouse are imagined to be satisfied by purchase at the 
selling price, followed by a purchase period with zero requirements.) 

The other weak chapter is that on statistical linear programming. This field includes several problems. 
The constraints may be known but some coefficients in the objective function may be random variables. 
These coefficients may be replaced by their mean value if one is content to optimize the mean value 
of the objective function, but one is faced with a (standard type of) quadratic programming if one 
wants to minimize the variance of the objective function for a given mean. This problem has been 
studied by H. M. Markowitz in connexion with investment portfolio selection. Other problems arise 
if some coefficients in the constraints are regarded as random variables. One may be interested in 
the resulting distribution of the optimum value of the objective function. But often the key decisions 
have to be taken before all the values of the unknowns are known. Some of the variables may then 
have to be selected knowing only the distributions o. the random elements, and one may ask how 
this can be done to optimize the mean value of the objective function, assuming the other variables 
are appropriately chosen. Without going into these distinctions, the author discusses a problem of this 
last type, which is made linear by approximating the distributions of the random variables (actually 
the right-hand sides) by discrete distributions. 

In spite of these blemishes, the book is a good introduction to linear programming. For the sake of 
readers with a limited mathematical background, the author avoids matrix notation. This does not 
prevent him from giving a detailed and lucid account of the ‘Revised Simplex’ or ‘Inverse Matrix’ 
method of organizing computations (though he does not discuss the product form of the inverse used 
in most large-scale computer programs for linear programming). 

Mathematics are not avoided, but are accompanied by a non-technical commentary on what is 


going on, and why. There are also many numerical examples, and problems for the reader after each 


chapter. E. M. L, BEALE 


Applied Statistical Decision Theory. By Howarp Rarrra and RoBert SCHLAIFER. 
Harvard Business School. London: Bailey Bros. and Swinfer Ltd. 1961. Pp. xxviii 
+356. 76s. 


This book is addressed to persons who desire to employ statistical methods in practical problems of 
decision making. Although the approach is from first principles with regard to decision theory, it is 
essential that the reader be quite familiar with statistical methods and concepts in addition to possessing 
a sound mathematical background. 

The work is in three parts: general theory, analysis when sampling and terminal utilities are ad- 
ditive, and distribution theory. It reads with the dryness of a handbook. This does not detract from 
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its value; as a handbook it is excellent and the reviewer has no hesitation in recommending it as a 
work of reference for those who are already reasonably conversant with the commercial problems of 
decision making. On the other hand it is felt that the book will not be of immediate value to those 
looking at the subject for the first time. Decision theory is a very interesting branch of applied statistics 
and presents some interesting problems for the theorist. It is perhaps a pity that the authors did not 
consider it necessary to introduce a series of examples and applications since this might well have 
increased the book’s area of appeal. J. G. SAW 


Reviews 


Strategy and Market Structure. Competition, Oligopoly, and the Theory of 
Games. By Martin Suusik. Foreword by Oskar MorGENSTERN. London: Chapman 
and Hall; New York: John Wiley and Sons. 1959. Pp. 387. 64s. 


The traditional problems of economics are the allocation of resources and the distribution of income. 
These problems require a theory of relative prices (of factors as well as final products). The great 
triumph of nineteenth-century economics was to show that profit maximization by many independent 
producers in atomistic competition yielded a system of resource allocation and income distribution 
which depended directly upon the ‘real’ situation of the economy—tastes, factor supplies, and the 
technologically given possibilities of production. This solution could be ‘distorted’ by monopoly ; but, 
while monopoly presented economists with a normative problem, the positive problem presented no 
theoretical difficulty because cost and demand functions could still be taken as given by the real 
forces of the economy (the former by technical possibilities and factor prices, the latter by consumers’ 
tastes and incomes). Consider, however, the case of a small number of firms in close competition with 
each other. Simple maximization of the difference between two given functions will not do: the price 
and output (and quality and advertising rates) of each firm affect the others, whose reactions must 
be allowed for in the formulation of policy. Now profit maximization ceases to be a simple concept, 
and relative prices, allocation, and income distribution depend upon market structure and the policy 
of firms towards their rivals as well as upon real forces. 

We have faced this oligopoly problem for a long time without making much of it, but since the 
publication of von Neumann and Morgenstern’s Theory of Games and Economic Behaviour we have 
been told repeatedly that the solution lies in game theory. Up till now, there has been plenty of 
propaganda but not much pay-off. Now a most distinguished game theoretician, Prof. Shubik, has 
made a full-dress onslaught on the problem in his Strategy and Market Structure; and it is obviously 
important to assess its success. This, I confess, I find remarkably difficult. The book is able, powerful, 
even exciting, but seems to leave two fundamental questions unanswered: (a) (assuming that we are 
etill interested in allocation and income distribution) what propositions does the theory yield on these 
matters? (b) how might the theory be tested? At times it seems that the answer to (a) is ‘nothing’ 
and to (b) ‘never’ because there are so many ‘ifs’ floating around (as Prof. Shubik recognizes, p. 226) ; 
but this is only an impression, and it is to be hoped that it is a wrong one. Prof. Shubik does not, 
in fact, address himself to these questions—it would probably require a second volume to do this, and 
it is to be hoped that he will write it. In the meantime one can only guess about the answers. 

What Prof. Shubik has done in this volume is to offer, first, a critique of existing oligopoly theory 
from the point of view of game theory and, secondly, his own dynamic theory of oligopoly, the ‘game 
of economic survival’ (plus two chapters of ‘application’). The first part serves to introduce the reader 
to game theory and to show how most existing theories of oligopoly can be made to ‘fall out’ as 
particular cases of game theory. My only serious criticism of this part (apart from having to plod 
through Cournot, Bertrand et al. once again, inescapable I suppose, for heuristic reasons, but tiresome 
none the less) is that Shubik, in spite of his opening remarks about the desirability of testable content, 
overlooks the well-known method of obtaining testable predictions from static theory by differentiating 
the solution values of the variables with respect to the parameters. 

The second part, in which Prof. Shubik presents his games of economic survival, is the most arresting 
part of the book. This treatment is extremely powerful, and enormously extends the range of behaviour 
and initial conditions that can be handled systematically. The asset structure of the firm appears in 
the theoretical structure, instead of merely in the long list of things that economic theory ‘ought to’ 
comprehend (t”:is list is the stock-in-trade of any institutionalist critic of current theory: the point is 
that Prof. Shubik can relate market behaviour to factors like asset structure and the structure of 
control and ownership in a systematic, analytic way). And this part is full of illuminating ideas, 
testimony to the power of the analysis, of which the following are examples. 
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(1) (p. 257). The lower the interest rate, the greater the incentive for the strong firm to start a ‘war 
of ruin’ in order to secure a monopoly position (because, in weighing the prospective profits from the 
exploitation of the monopoly against the current losses of the war, the former need not be so heavily 
discounted). 

(2) (Chapter 11). The stability of a market equilibrium depends on the power of players to make 
plausible threats of ‘police action’ against potential disturbers of the equilibrium, and may be defined 
in terms of the number of firms who would have to co-operate to make the threat effective. 

(3) (p. 243). A firm’s ‘vulnerability’ is defined as the derivative of its profits with respect to a 
strategy change by the rival. The ‘damage exchange rate’ is the ratio of the change in the rival’s profits 
to the change in profit of the firm which adopts a ‘damage’ strategy. (If the strategy is a price cut 
which the rival firm must follow, the ‘damage exchange rate’ necessarily favours the smaller firm 
which loses revenue on fewer units of output. This is the consequence of defining :e rate in absolute 
terms; it might be better to define it in elasticity terms, i.e. as the ratio of the percen*age loss of profit 
by firm 7 to the percentage loss by firm j for j’s change in strategy. In any case, its in.ju. .° ace to the 
plausibility of threats is obvious.) 

The book ends with a discussion of the American automobile and tobacco industrier im terms of the 
game theory of Part II, and some reflexions on the Anti-Trust Laws. Both chapters are interesting 
and illuminating; but is the whole apparatus of game theory necessary to justify the vonclusion that 

yeneral Motors is the dominant firm in the automobile industry, could afford competitive price cuts 

that would ruin all rivals save Ford, but dare not because of the Sherman Act? Is it necessary to 
justify the conclusion that the major cigarette firms are so much of-a-muchness that it wouldn’t pay 
any of them to start a ‘war to the ruin’? (We get the good point here that major firms may be able 
to get rid of minor ones without warfare, merely by accepting a stable price and leaving random 
fluctuations to take care of the tiddlers.) 

It only remains to say that Strategy and Market Structure is, in the opinion of this reviewer, an 
important and very able book. It may not yet justify all the claims that have been made for game 
theory, but it is an essential preliminary to testing them. 

(There are a few minor lapses that might be removed in a second edition. The Frisch—Allen concept 
of ‘conjectural variation’, analysed in the latter’s Mathematical Analysis for Economists, is worth 
explicit mention. In Chapter VI.2 the matter of imputed rent appears to have been overlooked. 
Throughout that chapter the unusual assumption is made that entry is a function of price instead of 
profit. In Fig. 9, p. 76, not all the curves are labelled. Fig. 1, p. 15, which is supposed to elucidate 
an elementary point in game theory, remains a mystery to me. On pp. 148-9 a kinked oligopoly demand 
curve is described as dynamic: it is merely a special case of conjectural variation. The English is often 
heavy going—words like ‘effectivity’ and ‘interconnectivity’ are not necessary.) G. C. ARCHIBALD 


Reviews 


Regression Analysis. By R. L. Puackerr. Oxford: Clarendon Press and Oxford 
University Press. 1960. Pp. 173. 35s. 


The book discusses the various aspects of Least Square theory in connexion with its application to 
the analysis of experimental data. A thorough understanding of the content demands a familiarity 
with the theory of matrices and complex integration, probably at the level of the senior mathematics 
undergraduate, and with these mathematical tools Plackett develops the theory clearly and concisely. 
As well as the anticipated topics (linear hypotheses, polynomial regression, stationary error processes, 
symmetrical factorial experiments) the book contains other information such as methods for inverting 
a matrix and the effect of departure from standard test conditions. If the entirely adequate manner 
in which the more familiar theory is discussed were not enough to recommend it, such sections as those 
on the condition for the independence of two quadratic forms in normal variables or the power of the 
variance-ratio test, taken in conjunction with the interesting examples and list of references at the end 
of each chapter would serve to make Regression Analysis a valuable addition to the bookshelf. 


J. G. SAW 
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Probability. An Introduction. By Samuet Gotpsere. U.S.A. and London: Prentice 
Hall, Inc. 1960. Pp. 322. 54s. 


If the student, reading this book without the added help of a lecture course, finds the going heavy 
then it is because of the subject-matter itself and not the fault of the author who discusses each topic 
with care and clarity and often with a relieving humour. 

Throughout the book, populations are held to be finite so that the differential and integral calculus 
is never called into use. Work begins with a discussion of the definition and algebras of sets and advances 
from these foundations to definitions and methods of manipulating probabilities including conditional 
and joint probabilities; the mean and variance of a population are also defined and discussed. In the 
last part of the book, the author discusses and develops the theory of Bernoulli trials and the Binomial 
distribution and introduces the idea of testing a hypothesis statistically by referring to the standard 
‘quality control’ techniques when sampling for defectives. The worked examples are plentiful and 
entertaining and the exercises seem carefully designed to test and increase the readers appreciation 
of each section. Answers and hints to aid the solution of half these exercises are given at the end of 
the book. 

Probability is perhaps ideally suited to the undergraduate in mathematics as a companion to a course 
on the applications of set theory. The statistics student has to concern himself with a series of in- 
creasingly sophisticated applications of probability theory, however, and it is perhaps quite common 
nowadays to invoke his almost intuitive concept of probability and to proceed from there. If this be 
the case, then such a student may well be recommended to read this book at the end of his first year 
of statistics, first to gain an appreciation of the logical argument and secondly to find firm ground 
for the future development of the subject. J. G. SAW 


Elementary Statistics. By Paut G. Hort. New York and London: John Wiley and 
Sons, Inc. 1960. Pp. 261. 44s, 


The author writes in the preface that his book is ‘designed for a one-semester course for students 
whose background in mathematics is limited to high-school algebra’. Consequently this book will 
come as a shock to many since it would seem that the high-school graduate is unequipped to draw the 
graph y = 2+32 and needs to be told that ‘the line rises three units vertically for every positive 
horizontal unit change’. 

Though we are introduced to sampling, theoretical frequency distributions, testing hypotheses, 
correlation and regression it is not surprising that since a page of detail needs be devoted to the descrip- 
tion and drawing of a line there is little hope of a mature discussion on the principle of least squares 
and its application to line fitting. The last four chapters on the chi-square distribution, non-parametric 
tests, analysis of variance and time series are written in similar vein as special topics. Each chapter 
concludes with a set of examples. 

The book is very poor in references to either classical or contemporary authors in statistics; this is 
a pity because, although ‘Elementary Statistics’ is intended to be a descriptive work, it should at 
least make some provision somewhere for the reader who is sufficiently stimulated to ask the question 
‘why’. J. G. SAW 


Sampling Methods for Censuses and Surveys. By F. Yarns. Third edition, revised 
and enlarged. 1960. Pp. xvi+440. 54s. 


This is the third edition of a book which first appeared in 1949. It appears to be a reprint of the 
second edition (1953) with the addition of Chapter 11 on the use of electronic computers in the analysis 
of censuses and surveys. In this last chapter the general principles governing the working of electronic 
computers and their programming are set out and useful instructions are given regarding their applica- 
tion to problems considered previously. F. N. DAVID 
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Statistische Methoden der Populationsgenetik. By Henri Louis Lz Roy. Basel and 
Stuttgart: Birkhauser Verlag. 1960. Pp. 397. 67.50. Sw. Fr. 


To the English or American reader, the title of this book may be somewhat misleading as it does not 
deal with the subject developed by Fisher, Haldane and Sewall-Wright and thus is not comparable 
with Li’s (1955) book Population Genetics. Rather it covers a few aspects of quantitative genetics, the 
analysis of variance as a means for estimating the effects of genetical and environmental factors and 
the application of this knowledge to the selective practices of breeders. It thus provides German- 
speaking readers with an introduction and some elaborations of the methods of Lush, Kempthorne 
and to some extent of Mather, Lerner and Falconer. As such it appears quite excellent. Starting in 
a fairly elementary manner it leads on to many detailed descriptions of more complex situations 
illustrating them by some numerical working examples. Sixty-five line drawings provide valuable 
help for visualizing the logical structure of many of the calculations. Another unusual feature of the 
book is the substitution at the end of three tables of content for a conventional index. In addition 
to a list of references used there is a large additional bibliography covering a wider field than the text. 

The book is beautifully produced. H. KALMUS 


Symposia of the Society for Experimental Biology. No. XIV. Models and Analogues 
in Biology. Ed. J. W. L. BrEament. Cambridge University Press. 1960. Pp. 255. 50s. 


The eighteen contributions in this volume are exceedingly diverse in content—though there is not 
a single botanical contribution—and to the reviewer’s mind also vary in relevance. A couple of quite 
excellent specialist papers could have been published elsewhere without raising any suspicion that 
they had anything to do with models or analogues; the pertinence of an introductory paper by the 
most venerable father of atomic theory is also problematical, while the editor states in the general 
conclusions to his own contribution that ‘Practical biophysical models can be particularly useful in 
the investigation of the properties of non-living organic materials and systems’. The other authors are 
more to the point, and very refreshingly they approach their tasks from quite different directions. 
Only a few attempt elaborate definitions of what they mean by a model or by an analogue, though 
most use the terms to include mathematical or logical hypotheses and imaginative propositions, as 
well as—sometimes rather grudgingly—a few devices, which a small boy might recognize as models. 

The following banches of biology are considered: genetics, development, tissue culture, muscle 
physics, animal locomotion, flight, behaviour, neurology and biology teaching. And the models and 
analogues described are derived from statistics, kinetics, electrical engineering, cybernetics, biochemistry 
and biophysics as well as from other biological specialities. Readers of this journal might be profession- 
ally interested in the contribution of J. H. Westcott on the ‘estimation of values of parameters of a 
model to conform with observations’. 

The volume is well produced but perhaps a little expensive. H. KALMUS 


Numerical Methods of Curve Fitting. By P.G. Guzst. London: Cambridge University 
Press. 1961. Pp. xii+422. 80s. 


This is a book written by a physicist primarily for other physicists. The title is perhaps a little 
misleading. This is essentially a book on the fitting of regression curves and there is, for instance, 
scarcely any mention of the fitting of frequency functions or of time series analysis. 

The book is in three parts: 

Part I (82 pp.). Distribution theory, tests of hypotheses, confidence limits, etc. 

Part II (64 pp.). Straight line regression. 

Part III (263 pp.). Polynomial and other regression. 

It is the third part which will be of most value since the first two parts are more fully covered in 
many of the standard introductions to statistical theory. The third part includes, for instance, the 
fitting of power series, of orthogonal polynomials when the observations are equally spaced including 
the case where some of the values are missing, the use of grouped observations, of harmonic analysis 
and of multiple regression. 
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For someone who has frequently to fit curves to sets of data this should prove an excellent handbook. 
Every method is described in detail and accompanied by examples, calculating schemes and in many 
places by tables of coefficients which the author has obviously found useful in his own work. Deriva- 
tions are given of almost all of the results required. The author does not use matrix notation but since 
much of the literature on regression uses it, there is a section devoted to an explanation of the use of 


Reviews 


matrices. ALAN J. MILLER 


Characteristic Functions. By E. Luxacs. Griffin’s Statistical Monographs and Courses, 
no. 5—Double Volume. London: C. Griffin and Co. 1960. Pp. 216. 38s. 


This book is an account, from the standpoint of pure mathematics, of the theory of characteristic 
functions of probability distributions on the real line. The first four chapters treat the basic theory. 
The rudiments of distribution function theory are first reviewed without proofs. The characteristic 
function and the moments are then defined, and their elementary properties established. The unique- 
ness theorem is proved, without inversion, by an approximation method, and then follows an elementary 
(i.e. without dominated convergence) proof of the inversion formula. The convolution formula and the 
continuity theorem are proved. Then simple necessary conditions are found for a function to be a 
characteristic function, and Bochner’s characterizetion (i.e. ‘Bochner’s theorem’) is proved together 
with three other less well-known characterizations. These four chapters also treat the most familiar 
distributions as examples and discuss several special classes of distributions (lattice, absolutely 
continuous, singular, etc.). 

The author does not consider applications of the theory, and he excludes the central limit theorem 
and results of that type (so fully treated by B. V. Gnedenko & A. N. Kolmogorov: Limit Distributions 
for Sums of Independent Random Variables, Addison-Wesley, Cambridge, Mass., 1954). The rest of 
the book (more than half) is instead devoted almost entirely to factorization problems. 

Chapter V treats the elementary properties of infinitely divisible distributions and then the canonical 
representations of Lévy—Khintchine, Lévy, and Kolmogorov. The standard examples are cited, and 
stable distributions briefly considered. The next chapter treats the general factorization problem. 
More detailed results age obtained in the succeeding two chapters devoted to analytic characteristic 
functions (i.e. those admitting analytic extension to a horizontal strip in the complex plane). The 
analytic behaviour of the characteristic function is related to properties of the distribution. Necessary 
conditions for analytic functions to be characteristic functions are found (including results of the 
author and the theorem of Marcinkiewicz) by application of the theory of entire functions. A remark- 
able result of Zinger and Linnik about analytic characteristic function solutions of differential equations 
is proved. The well known theorems of Cramér and Raikov on factorization of normal and Poisson 
distributions are proved. Other problems are considered in these chapters, with many examples. 
There is a brief final chapter on integral transforms of distribution functions. 

The book is a comprehensive and well-organized account of a subject in which the modern sources 
are widely scattered, and will become a standard reference for all users of characteristic function 
theory. Prospective buyers should be warned that the first printing contained many minor errors; 
the text has now been reset. D. A. EDWARDS 


The Elliptic Functions as they should be. By Abert Eacie. Cambridge: Galloway 
and Porter. 1958. Pp. 508. 45s. 


Nearly all statisticians require the properties of elliptic functions at some time or other in their 
career and most of these have to resort hastily to a text-book to refresh their memories or to learn 
anew. The difficulty has been until now to find a book which will so supplement one’s mathematical 
education without burdening oneself unduly with techniques irrelevant to one’s main purpose. Mr Eagle 
has written an excellent treatise which contains all that the user of applicable mathematics will want 
and in which he delineates the theory in an easily comprehensive way. 

In the book he covers twenty elliptic functions and their forty trigonometric series, derivatives and 
squares of elliptic functions, theta and peeta functions, the addition theorems, elliptic integrals of the 
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first, second, third and fourth kinds, the kay and eee functions, the weierstrassian poleless functions, 
applications of the elliptic functions in geometry and mechanics, conformal representation and the 
elliptic functions in it. There are examples mostly of the ‘prove that’ variety with a short line where 
the author thinks the proof may cause difficulty. The reviewer was not helped, however, by the remark 
at the end of the statement of a problem that this ‘is extraordinarily easy’. 

This book can be unreservedly recommended to all students and research workers seeking for 
knowlege of the elliptic functions. The author has certain eccentricities of nomenclature and notation 
(such, for example, as writing }7 as T and calling it ‘hi’) but these are easily learnt and should not 
put anyone off learning from him. F. N. DAVID 


Finite-difference Methods for Partial Differential Equations. By Grorar E. 
ForsytTHe and Wotreane R. Watson. London and New York: John Wiley and 
Sons, Inc. 1960. Pp. 444. 92s. or $11.50. 


When a partial differential equation is replaced by an approximating finite difference equation, an 
error is introduced, the estimation of which is as formidable a task for a mathematician as the solution 
of the difference equation is for even a high speed electronic computer. The accumulation of rounding 
off errors is also very difficult to control. The authors seem to have explained this difficult theory as 
lucidly as possible. 

The reader need not know anything about partial differential equations but should know enough 
analysis for an honours degree in mathematics; and some experience of using electronic computers on 
easier problems would help. A reader not so qualified will find that Lebesgue integration is used but 
not defined; and that the introductory section on computers assumes without explicitly saying so 
that they use the binary scale. G. L. WATSON 


The Calculus of Finite Differences. By Grorar Boortz. Fourth edition. New York: 
Chelsea Publishing Co. 1960. Pp. vii+ 336. $1.39 (paper), $3.95 (cloth). 


Boole’s Finite Differences was written in 1860 and has stood the test of time. It is still in the 
reviewer's opinion the best text-book from which to learn this branch of mathematical statistics and 
probability, and the Chelsea Publishing Company are to be congratulated on reprinting it as a 
paper-back. It may be recalled that the work covers interpolation, quadrature, the summation of 
series, Bernouilli numbers and factorial powers, difference-equations of the first order, linear difference- 
equations with constant and with variable coefficients, mixed and partial difference-equations. Exer- 
cises and answers are given. The price makes it possible for all students to possess this standard work. 

F. N. DAVID 


Fundamentals of College Algebra. By W. H. Durrer. New York and London: The 
Macmillan Co, New York. 1960. Pp. xi+ 250. 31s. 6d. 


This is a text-book of classical algebra presented from the modern point of view. The topics covered 
are sets, numbers, equations, complex numbers, polynomials, unequalities, groups, exponents and 
logarithms. It will be of little interest to would-be students of statistics—with exception perhaps of 
the last chapter on the approximate solution of equations—since the topics discussed are not advanced 
far enough. Exercises and solutions are given. F. N. DAVID 
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Other books received 


Statistical Theory and Methodology in Science and Engineering. By K. A. BrowniEr. London 
and New York: John Wiley and Sons. 1960. Pp. xv +570. 134s. or $16.75. 


Mathematical Thinking in the Measurement of Behavior. By James 8. CoLEMAN, ERNEST W. 
Apams and HERBERT SOLOMON (Ed.). Illinois: The Free Press of Glencoe. 1961. Pp. 314. $7.50. 


Developments in Mathematical Psychology. By R. Duncan Luce (Ed.), Robert R. Busx and 
J.C. R. LickiipeEr. Illinois: The Free Press of Glencoe. 1960. Pp. 294. $7.50. 


Field Computations in Engineering and Physics. By A. THom and Cotin J. Apett. Foreword 
by G. F. J. Tempte. London: D. Van Nostrand Co. 1961. Pp. 165. 30s. 


General Systems. Yearbook of the Society for General Systems Research. Vol. V. Michigan: Society 
for General Systems Research. 1960. Pp. 235. $7.50. 


Sample Design in Business Research. By W. Epwarps Deminc. London and New York: John 
Wiley & Sons. 1960. Pp. 517. 96s. 


Projective Geometry of n Dimensions. By Orro ScHREIDER and EMANUEL SPERNER. New York: 
Chelsea Publishing Co. 1961. Pp. 208. $4.95. 


Textbook of Algebra. Reprints of Vols. I and II. 6th edition. By G. Corystat. New York: Chelsea 
Publishing Co. 1959. Vol. I, 571 pp. Vol. II, 616 pp. $2.95 each. 


Quantitative Methods in Pharmacology. Proceedings of a Symposium held in Leyden, May 1960. 
Ed. H. pz Jonce. Amsterdam: North-Holland Publishing Co. 1961. Pp. xx+391. 84s. 


Arithmetic: An Introduction to Mathematics. By L. Cuark Lay. New York and London: The 
Macmillan Company. 1961. Pp. 323. 31s. 6d. 


Decomposition of Superpositions of Distribution Functions. By PAt Mrepayerssy. Budapest: 
Publishing House of the Hungarian Academy of Science. 1961. Pp. 227. 


Lebruch der Praktischen Statistik. By Perer QuantE. Berlin: Walter de Gruyter and Company. 
1961. Pp. 443. DM30. 


Analysing Qualitative Data. By A. E. Maxwetu. London: Methuen & Co; New York: John Wiley 
and Sons. 1961. Pp. 163. 16s. 


L’Anglais Economique Dictionnaire de Concepts. Paris: Editions Cujas. 1961. Pp. 199. 


International Work in Health Statistics, 1948-1958. By H.S. Gear, Y. Brraup and 8. Swaroop. 
Geneva: World Health Organization. 1961. Pp. 56. 3s. 6d., $0.60 or Sw. Fr. 2. 


A First Course in Mathematical Statistics. By G. E. WearHEerBuRN (Reprinted in paper-back 
edition). Cambridge: The University Press. 1961. Pp. 277. 17s. 6d. 


Probability. A First Course. By Freprerick Mostetier, Ropert E. K. Rourke and Grorce B. 
THomas. Massachusetts and London: Addison-Wesley Publishing Company. 1961. Pp. 319. 30s. 


Probability with Statistical Applications. By Freprrick MosTe.ierR, RoBert E. K. RourKE and 
GrorcE B. Toomas. Messachusetts and London: Addison-Wesley Publishing Company. 1961. 
Pp. 478. 38s. 


Tables of the Hypergeometric Probability Distribution. Stanford Studies in Mathematics and 
Statistics. III. By Grratp J. LieperMAN and Donatp B. Owen. Stanford: University Press; 
Oxford: University Press. 1961. Pp. 726. £6. 


Markov Learning Models for Multiperson Interactions. By Patrick Supres and Ricuarp C. 
Atkinson. Stanford: University Press; Oxford: University Press. 1961. Pp. 296. 66s. 
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General Notes drawn up for the Guidance of Authors on the 


Preparation of Manuscripts for Biometrika 





Papers should be headed by an informative title, by the name or names of the author(s), 
and where appropriate by the name and address of the laboratory or institution where 
the work was performed. Manuscripts should normally be typed in double spacing 
with wide margins and on one side of the paper only; they should be on sheets of 
uniform size, numbered consecutively. In addition to the top copy it is editorially 
convenient, though not essential, that a second copy of the paper be submitted to the 
journal. It is helpful if a shortened version of the title, not exceeding 65 letters and 
spaces, or about eight words in length, could be supplied for a page-head. 


Special care should be taken with mathematics, and any complicated mathematical 
expressions should be inserted very clearly in the manuscript. Where algebraic expres- 
sions are written in, particular attention should be given to distinguishing between 
capitals and lower-case letters, e.g. S, s, etc. Where Greek letters or unusual symbols 
are used it may help the printer if they are distinguished by writing or underlining in 
coloured ink, and a key provided for the compositor. Alternatively each unusual letter 
may be indicated by name in the margin at the first mention, e.g. ‘» = Greek mu’, 
‘Ry = italic capital R and sub-script italic capital F’. Letters in script should also be 
indicated in the margin. Equation numbers should be placed after, not before, the 
equation. 


Tables, except very small ones, should be on separate sheets of paper, since they are 
not set up at the same time as the body of the text; the position at which they are to be 
inserted should be indicated on the manuscript. They should have headings or footnotes 
which make their general meaning comprehensible to the reader without reference to 
the text. 


Illustrations should be kept to a minimum. The author’s name and the number of 
the figure should be written on the back of each, with legends on a separate sheet. 
Diagrams that are not drawn by professional draughtsmen generally have to be re-drawn 
and re-lettered before blocks are made, and for this reason it is advisable to draw 
illustrations carefully in pencil from which the printer will make good ink drawings. It is 
most important that no mistake should be made in any part of the author’s original 
drawing. 


The Harvard system is used for references in this journal, references being collected 
at the end of the paper in alphabetical order of author. Each reference must give name 
followed by initials of author, the year of publication in brackets, title of paper, journal 
title abbreviated in accordance with the World List of Scientific Periodicals, volume 
number in arabic numerals with wavy line to indicate black type and the number of the 
first and last page in arabic numerals. 


Papers should be submitted in their final form so that proof correction is confined 
to a minimum. 

















BIOMETRIKA PUBLICATIONS 


Issued by the Cambridge University Press, Bentley House, London, N.W, 1 
and obtainable from any bookseller 


Tables of the Incomplete B-Function EpiteD By KARL PEARSON 
59 pages of Introduction and 494 pages of Tables Price: 70s. net 


Tables of the Incomplete r-Function Epitep By KARL PEARSON 
31 pages of Introduction and 164 pages of Tables Price: 42s. net 


Tables of the Complete and Incomplete Elliptic Integrals 
(from LEGENDRE’S Traité des Fonctions Elliptiques. With autographed portrait of LEGENDRE) 
39 pages of Introduction by KARL PEARSON and 94 pages of Tables Price: 12s. 6d. net 


Tables of the Ordinates and Probability Integral of the Distribution of 
the Correlation Coefficient in Small Samples By F. N. DAVID 
38 pages of Introduction, 55 pages of Tables, 10 Diagrams and 4 Charts Price: 17s. 6d. net 


Biometrika Tables for Statisticians, Vol. I 
EDITED By E. S. PEARSON and H. O. HARTLEY for the Biometrika Trust 
102 pages of Introduction and 136 pages of Tables Price; 25s. net 


The Life, Letters and Labours of Francis Galton, Vols. I, If, Ma, & IIs 
By KARL PEARSON, F.R:S. Price: £5. 5s. net 


Karl Pearson: An Appreciation of Some Aspects of his Life and Work 
By E. S. PEARSON Price: 15s. net 


A Bibliography of the Statistical and Other Writings of Karl Pearson 
ComPiLep sy G. M. MORANT, with the assistance of B. L. WELCH Price: 6s. net 


**Student’s”’ Collected Papers EpiTeD By E. S. PEARSON and 
JOHN WISHART witha ForEworp sy LAUNCE McMULLEN Price: 21s. net 


Karl Pearson’s Early Statistical Papers 


Reprinted by photo-lithography for the Biometrika Trust, with the permission of the original publishers. 
The Volume contains eleven papers, including the more important of the memoirs entitled ‘‘ Mathematical 
Contributions to the Theory of Evolution”, first published in the Philosophical Transactions of the Royal 
Society. The original paper deriving the x?-distribution, published in 1900 in the Philosophical Magazine, is 
also included. Price: 25s. net 





























PUBLICATIONS OF THE DEPARTMENT OF 


STATISTICS, UNIVERSITY COLLEGE, LONDON 


Issued by the Cambridge University Press, Bentley House, London, N.W. 1 
and obtainable from any bookseller 


TRACTS FOR COMPUTERS 


. Tables of the Digamma and Trigamma Functions. By ELEANOR PAIRMAN, M.A. 
A 2 1 
Tables for summing S = ; ; ; 
. € (pitq) (Pita) (Pnit+Qn) 
factors. Price 7s. 6d. net. 





where the p’s andq’s are numerical 


. Table of Coefficients of Everett’s Central-Difference Interpolation Formula. By A. J. 
THOMPSON, PH.D. Second edition. Price 7s. 6d. net. 


. Table of the Logarithms of the Complete -Function (to ten decimal places ) for Argument 
2 to 1200 beyond Legendre’s Range (Argument 1 to 2). By Econ S. Pearson, D.Sc. 
Price 7s. 6d. net. 


. Log T (x) from x = 1 to 50-9 by intervals of 0-01. By JoHN BROWNLEE, M.D., D.Sc. 
Price 7s. 6d. net. 


. On Quadrature and Cubature or on Methods of Determining Approximately Single and 
Double Integrals. By J. O. IRwtn, D.Sc. Price 7s. 6d. net. 


. Tables of the Probable Error of the Coefficient of Correlation. By KARL HOLZINGER, PH.D. 
Price 7s. 6d. net. 


. Bibliotheca Tabularum Mathematicarum, being a Descriptive Catalogue of Mathematical 
Tables. Part I. A, Logarithms of Numbers. By JAMEs HENDERSON, PH.D. Price 9s. net. 


. Random Sampling Numbers. By L. H. C. Tippett, M.Sc., with a Foreword by KARL 
PEARSON. Price 7s. 6d. net. 


. Tables of tan—'x and log (1+ x’). To assist in the calculation of the ordinates of a Pearson 
Type IV curve. By L. J. Comrig, Pu.D. Price 7s. 6d. net. 


. Random Sampling Numbers (2nd Series). By M. G. KENDALL and B. BABINGTON SMITH. 
Price 7s. 6d. net. 








. Random Normal Deviates. By HERMAN WOLD. Price 7s. 6d. net. 


. Correlated Random Normal Deviates. By E. C. FiecLer, T. Lewis and E. S. PEARSON. 
Price 10s. 6d. net. 
Nos. II, II, IV, VI and VII are out of print 


> 


LOGARITHMETICA BRITANNICA 





A standard Table of Logarithms to Twenty Decimal Places. By A. J. THOMPSON, Ph.D. 
(commenced in 1922 to commemorate the tercentenary of the publication of HENRY BRIGGs’S 
Arithmetica Logarithmica). 
The nine separate sections of this Table have now been issued, and the complete work 
consisting of the logarithms of numbers 10,000-100,000, together with Dr Thompson’s 
General Introduction (98 pp.), is available in two bound volumes. 








Price £8. 8s. Od. 
































































NEW STATISTICAL TABLES: SEPARATES RE-ISSUED 
FROM BIOMETRIKA 


To be obtained from 
BIOMETRIKA OFFICE, UNIVERSITY COLLEGE, LONDON, W.C.1 


1. From Biometrika, Vols. 22, 27 and 28 
Tests of Normality. By E. S. PEARSON and R. C. GEARY Price 2s. 6d., post free 


2. From Biometrika, Vol. 32, pp. 168-181 and 188-189 
(1) Table of percentage points of the incomplete beta-function 
(2) Table of percentage points of the x? distribution 
Stitched together with introductory matter. Price 2s. 6d., post free 


3. From Biometrika, Vol. 32, pp. 300-310 
(1) Table of the probability integral of the range in samples from a normal population 
(2) Table of the percentage points of the range 
(3) Table of the percentage points of the t-distribution 
Stitched together with introductory matter. Price 2s. 6d., post free 
4. From Biometrika, Vol. 33, pp. 73-88 
Table of percentage points of the inverted beta (F) distribution 
With introductory matter. Price 2s. 6d., post free 
5. From Biometrika, Vol. 33, pp. 252-265 


(1) Table of the probability integral of the mean deviation in samples from a normal 
population 


(2) Table of the percentage points of the mean deviation 
Stitched together with introductory matter. Price 2s. 6d., post free 
6. From Biometrika, Vol. 33, pp. 296-304 
Table for testing the homogeneity of a set of estimated variances 
With introductory matter. Price 2s., post free 





7. From Biometrika, Vol. 35, pp. 145-156 


Table of significance levels for the Fisher-Yates test of significance in 2 x 2 contingency 


tables. By D. J. FINNEY With introductory matter. Price 2s. 6d., post free 

8. From Biometrika, Vol. 35, pp. 191-201 | 
Table for the calculation of working probits and weights in probit analysis. By D. J. FINNEY 
and W. L. STEVENS With introductory matter. Price 2s. 6d., post free 


9. From Biometrika, Vol. 36, pp. 267-289 
Tables of autoregressive series. By M.G. KENDALL With introductory matter. Price 2s. 6d., post free 


10, 14, 18, and 20. From Biometrika, Part 1 from Vol. 36, pp. 431-449, Parts 2 and 3 from 
Vol. 38, pp. 435-462, Part 4 from Vol. 40, pp. 427-446, and Part 5 from Vol. 42, pp. 223-242 
Tables of symmetric functions. By F. N. DAVID and M. G. KENDALL 

With introductory matter. Price 14s. 6d., post free 
(Part 1, 2s. 6d.; Parts 2 and 3, 4s.; Part 4, 4s.; Part 5, 4s.) 

11. From Biometrika, Vol. 39, p. 190 and Vol. 43, pp. 449-451 
Tables of percentage points of the extreme “Studentized” deviate from the sample mean. 
By K. R. NAIR and H. A. DAVID With introductory matter. Price 1s., post free 

12. From Biometrika, Vol. 37, pp. 168-172 and pp. 313-325 
(1) Table of the probability integral of the t-distribution 


(2) Table of the x’ integral, and of the cumulative Poisson distribution. By H. O. HARTLEY 
and E. S. PEARSON Stitched together with introductory matter. Price 5s., post free 
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NEW STATISTICAL TABLES: continued 
13. From Biometrika, Vol. 38, pp. 112-130 


Charts of the power function for analysis of variance tests, derived from the non-central 
F-distribution. By E. S. PEARSON and H. O. HARTLEY 


With introductory matter. Price 2s. 6d., post free 
15. From Biometrika, Vol. 38, pp. 423-426 
A chart for the incomplete beta-function and the cumulative binomial distribution. By H. O. 
HARTLEY and E. R. FITCH With introductory matter and ruler scale. Price 2s. 6d., post free 
16. From Biometrika, Vol. 40, pp. 70-73 
Tables of the angular transformation. By W. L. STEVENS 
With intreductory matter. Price 1s., post free 


17. From Biometrika, Vol. 40, pp. 74-86 
Tests of significance in a 2x2 contingency table: extension of Finney’s table (No. VII). 
Computed by R. LATSCHA With introductory matter. Price 2s. 6d., post free 
19. From Biometrika, Vol. 41, pp. 253-260 
Tables of generalized k-statistics. By S. H. ABDEL-ATY With introductory matter. Price 2s., post free 


21. From Biometrika, Vol. 42, pp. 494-511 
A new form of table for significance tests in a 2x2 contingency table. By P. ARMSEN 


With introductory matter. Price 2s. 6d., post free 
22. From Biometrika, Vol. 43, pp. 388-403 


Tables for certain applications of sequential methods in the analysis of variance. By W. D. RAY 
With introductory matter. Price 2s. 6d., post free 


23. From Biometrika, Vol. 43, pp. 423-435 
Table for determining confidence limits for a proportion in binomial sampling. By EDWIN L. 
CROW With introductory matter. Price 2s. 6d., post fre 
24. From Biometrika, Vol. 44, pp. 411-419 
Tables for estimating the normal distribution function of normit analysis. Part 1. Tables and 
description of their use. By JOSEPH BERKSON With introductory matter. Price 2s. 6d., post free 
25. From Biometrika, Vol. 44, pp. 482-489 
Table of significance points for a two-sample t-test based on range. By P. G. MOORE 


With introductory matter. Price 2s. 6d., post free 
26. From Biometrika, Vols. 44 & 45 
Tables of the upper percentage points of the generalized beta distribution. By F. G. FOSTER 
and D. H. REES With introductory matter. Price 5s., post free 
27. From Biometrika, Vol. 46, pp. 178-204 
Tables of 1000 standardized random deviates from certain non-normal distributions. By M. H. 
QUENOUILLE. With a note by E. S. Pearson With introductory matter. Price 2s. 6d., post free 
28. From Biometrika, Vol. 46, pp. 441-453 
Table of confidence limits for the expectation of a Poisson variable. By EDWIN L. CROW and 
ROBERT S. GARDNER With introductory matter. Price 2s. 6d., post free 
29. From Biometrika, Vol. 47, pp. 121-142 
Nomograms for fitting the logistic function by maximum likelihood. By JOSEPH BERKSON 
With introductory matter. Price 5s., post free 
30. From Biometrika, Vols. 47 & 48 


Tables of Neyman-shortest unbiased confidence intervals: (a) for the binomial parameter; 
(b) for the Poisson parameter. By COLIN R. BLYTH and DAVID W. HUTCHINSON 
With introductory matter. Price 2s. 6d. post free 





Please send remittance, made payable to Biometrika, crossed ‘a/c BIOMETRIKA TRUST’ to the SECRETARY, BIO- 
METRIKA OFFICE, UNIVERSITY COLLEGE, GOWER STREET, LONDON, W.C.1, with order. To pay for an order 
in dollars, add 2s. for bank charges to the total sum due in sterling before converting to dollars. [E.g. For one copy each 
of Tables Vil, XVII and XXI pay 7s. 6d. sterling, but 9s. 6d. = $1.33 (@ $2.80 to £1) if paying by dollar cheque.] 

















































SEPARATES RE-ISSUED FROM BIOMETRIKA 


To be obtained from 
BIOMETRIKA OFFICE, UNIVERSITY COLLEGE, LONDON, W.C.1 


The Biometrika Office has available for sale copies of many of the papers which have been published in Biometrika (right back to Volume 1). Readers 
requiring offprints of any papers are invited to enquire from the above address whether they are still available. The following are a few examples 
of fairly recent separates of which copies may be obtained: 


From Biometrika, Vol. 39, pp. 324-345 
Rank Analysis of Incomplete Block Designs |. The Method of Paired Comparisons. By R. A. 
BRADLEY and M. E. TERRY Price 4s., post free 
From Biometrika, Vol. 41, pp. 502-537 
Rank Analysis of Incomplete Block Designs II. Additional Tables of Paired Comparisons. By 
R. A. BRADLEY Price 5s., post free 
From Biometrika, Vol. 43, pp. 203-205 
Further critical values for the two-means problem. By W. H. TRICKETT, B. L. WELCH and G. S. 
JAMES Price 1s., post free 
From Biometrika, Vol. 44, pp. 1-8 
John Wishart, 1898-1956. Obituary Notice and Bibliography. By E. S. PEARSON 
From Biometrika, Vol. 44, pp. 490-514 Pripn By pane free 
A bibliography on the theory of queues. By ALISON DOIG Price 5s., post free 
From Biometrika, Vol. 45, pp. 293-315 
THOMAS BAYES’S Essay towards solving a problem in the doctrine of chances. [Reproduced 
from Phil, Trans. Roy. Soc. 1763, 53, 370-418.] With a biographical note by G. A. BARNARD 
From Biometrika, Vol. 45, pp. 521-543 Price 5s., post free 
A bibliography on life testing and related topics. By WILLIAM MENDENHALL Price 5s., post free 
BIOMETRIKA INDEX. Comprising Subject Index for Vols. 1-37 and Author Index for Vols. 
1-40, with Author Index Supplement covering Vols. 41-43. Price 6s. or $1.00, post free 
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STATISTICAL EXERCISES 


Issued by the DEPARTMENT OF STATISTICS 
UNIVERSITY COLLEGE LONDON 


Elementary Statistical Exercises. By F. N. DAVID and E. 8. PEARSON 
Published by the Cambridge University Press 
These exercises were collected in connection with the numerical classwork undertaken by students 
during the first year of the B.Sc. Special Degree course in Statistics at University College. They 
deal with applications of the more elementary univariate and bivariate theory. 
Answers are provided. Price 13s. 6d. 


Statistical Exercises (Part I). Analysis of variance and associated techniques. 
Compiled by N. L. JOHNSON 


This volume includes over 100 exercises on the following topics: 
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